Monitoring Amazon EMR events with CloudWatch
Amazon EMR tracks events and keeps information about them for up to seven days in the Amazon EMR console. Amazon EMR records events when there is a change in the state of clusters, instance groups, instance fleets, automatic scaling policies, or steps. Events capture the date and time the event occurred, details about the affected elements, and other critical data points.
The following table lists Amazon EMR events, along with the state or state change that the event indicates, the severity of the event, event type, event code, and event messages. Amazon EMR represents events as JSON objects and automatically sends them to an event stream. The JSON object is important when you set up rules for event processing using CloudWatch Events because rules seek to match patterns in the JSON object. For more information, see Events and event patterns and Amazon EMR events in the Amazon CloudWatch Events User Guide.
Note
To ensure that we provide you with the most pertinent information, we continuously refine our error messages. For that reason, we recommend that you don’t parse the text from the messages to initiate next actions in your workflow.
Cluster start events
State or state change | Severity | Event type | Event code | Message |
---|---|---|---|---|
CREATING |
WARN |
EMR instance fleet provisioning | EC2 provisioning - Insufficient Instance Capacity | We are not able to create your Amazon EMR cluster ClusterId
(ClusterName) for Instance Fleet InstanceFleetID
Amazon EC2 has insufficient Spot capacity for Instance type
[Instancetype1, Instancetype2] and insufficient
On-Demand capacity for Instance type [Instancetype3,
Instancetype4] in Availability Zone [AvailabilityZone1,
AvaliabilityZone2] . Check here documentation
for more information on how to respond to this event. |
CREATING |
WARN |
EMR instance group provisioning | EC2 provisioning - Insufficient Instance Capacity | We are not able to create your Amazon EMR cluster ClusterId
(ClusterName) for Instance Group InstanceGroupID
Amazon EC2 has insufficient Spot capacity for Instance type
[Instancetype1, Instancetype2] and insufficient
On-Demand capacity for Instance type [Instancetype3,
Instancetype4] in Availability Zone [AvailabilityZone1,
AvaliabilityZone2] . Check here documentation
for more information on how to respond to this event. |
CREATING |
WARN |
EMR instance fleet provisioning | EC2 provisioning - Insufficient Free Addresses In Subnet | We can’t create the Amazon EMR cluster ClusterId (ClusterName) that you
requested for instance fleet InstanceFleetID because the specified subnet [Subnet1, Subnet2]
doesn't contain enough free private IP addresses to fulfill your request. Use the
DescribeSubnets operation to see how many IP addresses
are available (unused) in your subnet. For information on how to respond to this event,
see Error codes for the Amazon EC2 API |
CREATING |
WARN |
EMR instance group provisioning | EC2 provisioning - Insufficient Free Addresses In Subnet | We can’t create the Amazon EMR cluster ClusterId (ClusterName) that you
requested for instance group InstanceGroupID because the specified subnet [Subnet1, Subnet2]
doesn't contain enough free private IP addresses to fulfill your request. Use the
DescribeSubnets operation to see how many IP addresses
are available (unused) in your subnet. For information on how to respond to this event,
see Error codes for the Amazon EC2 API |
CREATING
|
WARN
|
EMR instance fleet provisioning |
EC2 Provisioning – vCPU Limit Exceeded |
The provision of InstanceFleetID in the Amazon EMR cluster
ClusterId (ClusterName) is delayed because you've reached the limit
on the number of vCPUs (virtual processing units) assigned to the
running instances in your account (accountId) . For more information,
Error codes for the Amazon EC2 API
|
CREATING
|
WARN
|
EMR instance group provisioning |
EC2 Provisioning – vCPU Limit Exceeded |
The provision of instance group InstanceGroupID in the Amazon EMR cluster
ClusterId is delayed because you've reached the limit
on the number of vCPUs (virtual processing units) assigned to the
running instances in your account (accountId) . For more information,
Error codes for the Amazon EC2 API
|
CREATING
|
WARN
|
EMR instance fleet provisioning |
EC2 Provisioning – Spot Instance Count Limit Exceeded |
The provision of instance fleet InstanceFleetID in the Amazon EMR cluster ClusterID (ClusterName) is delayed
because you've reached the limit on the number of Spot Instances that you can launch in your account (accountId) . For more information,
see Error codes for the Amazon EC2 API.
|
CREATING
|
WARN
|
EMR instance group provisioning |
EC2 Provisioning – Spot Instance Count Limit Exceeded |
The provision of instance group InstanceGroupID in the Amazon EMR cluster ClusterID (ClusterName) is delayed
because you've reached the limit on the number of Spot Instances that you can launch in your account (accountId) . For more information,
see Error codes for the Amazon EC2 API.
|
CREATING
|
WARN
|
EMR instance fleet provisioning |
EC2 Provisioning – Instance Limit Exceeded |
The provision of instance fleet InstanceFleetID in the Amazon EMR cluster ClusterId (ClusterName) is delayed because you've reached the
limit on the number of instances you can run concurrently in your account (accountID) . For more information on Amazon EC2 service limits,
see Error codes for the Amazon EC2 API.
|
CREATING
|
WARN
|
EMR instance group provisioning |
EC2 Provisioning – Instance Limit Exceeded |
The provision of instance group InstanceGroupID in the Amazon EMR cluster ClusterId (ClusterName) is delayed because you've reached the
limit on the number of instances you can run concurrently in your account (accountID) . For more information on Amazon EC2 service limits,
see Error codes for the Amazon EC2 API.
|
CREATING |
WARN |
EMR instance group provisioning |
none |
Amazon EMR cluster - or - Amazon EMR cluster NoteA cluster in the |
STARTING
|
INFO
|
EMR cluster state change |
none |
Amazon EMR cluster |
STARTING
|
INFO
|
EMR cluster state change |
none |
NoteApplies only to clusters with the instance fleets configuration and multiple Availability Zones selected within Amazon EC2. Amazon EMR cluster |
STARTING
|
INFO
|
EMR cluster state change |
none |
Amazon EMR cluster |
WAITING
|
INFO
|
EMR cluster state change |
none |
Amazon EMR cluster - or - Amazon EMR cluster NoteA cluster in the |
Note
The events with event code EC2 provisioning - Insufficient Instance
Capacity
periodically emit when your EMR cluster encounters an
insufficient capacity error from Amazon EC2 for your instance fleet or instance group
during cluster creation or resize operation. For information on how to respond to
these events, see Responding to Amazon EMR cluster
insufficient instance capacity events.
Cluster termination events
State or state change | Severity | Event type | Event code | Message |
---|---|---|---|---|
TERMINATED
|
The severity depends on the reason for the state change, as shown in the following:
|
EMR cluster state change |
none |
Amazon EMR Cluster |
TERMINATED_WITH_ERRORS
|
CRITICAL
|
EMR cluster state change |
none |
Amazon EMR Cluster |
TERMINATED_WITH_ERRORS
|
CRITICAL
|
EMR cluster state change |
none |
Amazon EMR Cluster |
Instance fleet state-change events
Note
The instance fleets configuration is available only in Amazon EMR releases 4.8.0 and later, excluding 5.0.0 and 5.0.3.
State or state change | Severity | Event type | Event code | Message |
---|---|---|---|---|
From |
INFO
|
none | Provisioning for instance fleet |
|
From |
INFO
|
none | A resize for instance fleet |
|
From |
INFO
|
none | The resizing operation for instance fleet
|
|
From |
INFO
|
none | The resizing operation for instance fleet
|
|
SUSPENDED
|
ERROR
|
none | Instance fleet |
|
RESIZING
|
WARNING
|
none | The resizing operation for instance fleet
|
|
|
INFO
|
none | The resizing operation for instance fleet
|
|
|
INFO
|
none | A resizing operation for instance fleet
|
Instance fleet resize events
Event type | Severity | Event code | Message |
---|---|---|---|
EMR instance fleet resize |
ERROR |
Spot Provisioning timeout |
The Resize operation for Instance Fleet
|
EMR instance fleet resize |
ERROR |
On-Demand Provisioning timeout |
The Resize operation for Instance Fleet
|
EMR instance fleet resize |
WARNING |
EC2 provisioning - Insufficient Instance Capacity | We are not able to complete the resize operation for Instance
Fleet |
EMR instance fleet resize |
WARNING |
Spot Provisioning Timeout - Continuing Resize |
We're still provisioning Spot capacity for the Instance Fleet
resize operation that initiated at |
EMR instance fleet resize |
WARNING |
On-Demand Provisioning Timeout - Continuing Resize |
We're still provisioning On-Demand capacity for the Instance
Fleet resize operation that initiated at |
EMR instance fleet resize |
WARNING |
EC2 Provisioning - Insufficient Free Address in Subnet |
We can't complete the resize operation for instance fleet InstanceFleetID in
Amazon EMR cluster ClusterId (ClusterName) because the specified subnet
[Subnet1, Subnet2] doesn't contain enough free private IP addresses to fulfill your request.
Use the DescribeSubnets operation to view how many IP addresses are available
(unused) in your subnet. For information on how to respond to this event, see
Error codes for the Amazon EC2 API. |
EMR instance fleet resize |
WARNING |
EC2 Provisioning - vCPU Limit Exceeded |
The resize of instance fleet
InstanceFleetID in the Amazon EMR cluster ClusterName
is delayed because you've reached the limit on the number of vCPUs (virtual processing units) assigned to the running instances in your account (accountId) . For more
information, see Error codes for the Amazon EC2 API. |
EMR instance fleet resize |
WARNING |
EC2 Provisioning - Spot Instance Count Limit Exceeded |
The provision of instance fleet InstanceFleetID in the Amazon EMR cluster ClusterID (ClusterName) is delayed
because you've reached the limit on the number of Spot Instances that you can launch in your account (accountId) . For more information,
see Error codes for the Amazon EC2 API.
|
EMR instance fleet resize |
WARNING |
EC2 Provisioning - Instance Limit Exceeded |
The provision of instance fleet InstanceFleetID in the Amazon EMR cluster ClusterID (ClusterName) is delayed because
you've reached the limit on the number of on-demand instances you can run in your account (accountId) .
For more information on Error codes for the Amazon EC2 API.
|
Note
The provisioning timeout events are emitted when Amazon EMR stops provisioning Spot or On-demand capacity for the fleet after the timeout expires. For information on how to respond to these events, see Responding to Amazon EMR cluster instance fleet resize timeout events .
Instance group events
Event type | Severity | Event code | Message |
---|---|---|---|
From |
INFO
|
none | The resizing operation for instance group
|
From |
INFO
|
none | A resize for instance group |
SUSPENDED
|
ERROR
|
none | Instance group |
RESIZING
|
WARNING
|
none | The resizing operation for instance group
|
EMR instance group resize |
WARNING |
EC2 provisioning - Insufficient Instance Capacity | We are not able to complete the resize operation that started
at |
EMR instance group resize |
WARNING |
EC2 Provisioning - Insufficient Free Address in Subnet |
We can't complete the resize operation for instance group InstanceGroupID in
Amazon EMR cluster ClusterId (ClusterName) because the specified subnet
[Subnet1, Subnet2] doesn't contain enough free private IP addresses to fulfill your request.
Use the DescribeSubnets operation to view how many IP addresses are available
(unused) in your subnet. For information on how to respond to this event, see
Error codes for the Amazon EC2 API. |
EMR instance group resize |
WARNING |
EC2 Provisioning - vCPU Limit Exceeded |
The resize of instance group
InstanceGroupID in the Amazon EMR cluster ClusterName
is delayed because you've reached the limit on the number of vCPUs (virtual processing units) assigned to the running instances in your account (accountId) . For more
information, see Error codes for the Amazon EC2 API. |
EMR instance group resize |
WARNING |
EC2 Provisioning - Spot Instance Count Limit Exceeded |
The provision of instance group InstanceGroupID in the Amazon EMR cluster ClusterID (ClusterName) is delayed
because you've reached the limit on the number of Spot Instances that you can launch in your account (accountId) . For more information,
see Error codes for the Amazon EC2 API.
|
EMR instance group resize |
WARNING |
EC2 Provisioning - Instance Limit Exceeded |
The provision of instance group InstanceGroupID in the Amazon EMR cluster ClusterID (ClusterName) is delayed because
you've reached the limit on the number of on-demand instances you can run in your account (accountId) .
For more information on Error codes for the Amazon EC2 API.
|
From |
INFO
|
none | A resize for instance group |
Note
With Amazon EMR version 5.21.0 and later, you can override cluster configurations and specify additional configuration classifications for each instance group in a running cluster. You do this by using the Amazon EMR console, the AWS Command Line Interface (AWS CLI), or the AWS SDK. For more information, see Supplying a Configuration for an Instance Group in a Running Cluster.
The following table lists Amazon EMR events for the reconfiguration operation, along with the state or state change that the event indicates, the severity of the event, and event messages.
State or state change | Severity | Message |
---|---|---|
RUNNING
|
INFO
|
A reconfiguration for instance group
|
From |
INFO
|
The reconfiguration operation for instance group
|
From |
INFO
|
A reconfiguration for instance group
|
RESIZING
|
INFO
|
Reconfiguring operation towards configuration version
|
RECONFIGURING
|
INFO
|
Resizing operation towards instance count Num for
instance group InstanceGroupID in the Amazon EMR cluster
ClusterId (ClusterName) is temporarily blocked at
Time because the instance group is in
State . |
RECONFIGURING
|
WARNING
|
The reconfiguration operation for instance group
|
RECONFIGURING
|
INFO
|
Configurations are reverting to the previous successful version
number |
From |
INFO
|
Configurations were successfully reverted to the previous
successful version |
From |
CRITICAL
|
Failed to revert to the previous successful version
|
Automatic scaling policy events
State or state change | Severity | Message |
---|---|---|
PENDING
|
INFO
|
An Auto Scaling policy was added to instance group
- or - The Auto Scaling policy for instance group
|
ATTACHED
|
INFO
|
The Auto Scaling policy for instance group
|
|
INFO
|
The Auto Scaling policy for instance group
|
FAILED
|
ERROR
|
The Auto Scaling policy for instance group
- or - The Auto Scaling policy for instance group
|
Step events
State or state change | Severity | Message |
---|---|---|
PENDING
|
INFO
|
Step |
CANCEL_PENDING
|
WARN
|
Step |
RUNNING
|
INFO
|
Step |
COMPLETED
|
INFO
|
Step |
CANCELLED
|
WARN
|
Cancellation request has succeeded for cluster step
|
FAILED
|
ERROR
|
Step |
Unhealthy node replacement events
Event type | Severity | Event code | Message |
---|---|---|---|
Amazon EMR unhealthy node replacement |
INFO |
Unhealthy core node detected |
Amazon EMR has identified that core instance |
Amazon EMR unhealthy node replacement |
INFO |
Core node unhealthy - replacement disabled |
Amazon EMR has identified that core instance |
Amazon EMR unhealthy node replacement |
WARN |
Unhealthy core node not replaced |
Amazon EMR can't replace your NoteThe reason of why Amazon EMR can't replace your core node differs depending on your scenario. For example, one reason of why Amazon EMR can't delete a node is because a cluster wouldn't have any remaining core nodes. |
Amazon EMR unhealthy node replacement |
INFO |
Unhealthy core node recovered |
Amazon EMR has recovered your |
For more information about unhealthy node replacement, see Replacing unhealthy nodes.
Viewing events with the Amazon EMR console
For each cluster, you can view a simple list of events in the details pane, which lists events in descending order of occurrence. You can also view all events for all clusters in a region in descending order of occurrence.
If you don't want a user to see all cluster events for a region, add a statement that
denies permission ("Effect": "Deny"
) for the
elasticmapreduce:ViewEventsFromAllClustersInConsole
action to a policy
that is attached to the user.
To view events for all clusters in a Region with the console
-
Sign in to the AWS Management Console, and open the Amazon EMR console at https://console.aws.amazon.com/emr
. -
Under EMR on EC2 in the left navigation pane, choose Events.
To view events for a particular cluster with the console
-
Sign in to the AWS Management Console, and open the Amazon EMR console at https://console.aws.amazon.com/emr
. -
Under EMR on EC2 in the left navigation pane, choose Clusters, and then choose a cluster.
-
To view all of your events, select the Events tab on the cluster details page.