Common errors when running jobs
The following errors may occur when you run StartJobRun
API. The table lists each error and provides mitigation
steps so you can address issues quickly.
Error Message | Error Condition | Recommended Next Step |
---|---|---|
error: argument -- |
Required parameters are missing. | Add the missing arguments to the API request. |
An error occurred (AccessDeniedException) when calling the StartJobRun
operation: User: ARN is not authorized to perform:
emr-containers:StartJobRun |
Execution role is missing. | See Using Using job execution roles with Amazon EMR on EKS. |
An error occurred (AccessDeniedException) when calling the StartJobRun
operation: User: |
Caller doesn't have permission to the execution role [valid / not valid format] via condition keys. |
See Using job execution roles with Amazon EMR on EKS. |
An error occurred (AccessDeniedException) when calling the StartJobRun
operation: User: |
Job submitter and Execution role ARN are from different accounts. |
Ensure that job submitter and execution role ARN are from the same AWS account. |
1 validation error detected: Value |
Caller has permissions for the execution role via condition keys, but the role does not satisfy the constraints of ARN format. |
Provide the execution role following the ARN format. See Using job execution roles with Amazon EMR on EKS. |
An error occurred (ResourceNotFoundException) when calling the StartJobRun
operation: Virtual cluster |
Virtual cluster ID is not found. |
Provide a virtual cluster ID registered with Amazon EMR on EKS. |
An error occurred (ValidationException) when calling the StartJobRun
operation: Virtual cluster state |
Virtual cluster is not ready to execute job. |
See Virtual cluster states. |
An error occurred (ResourceNotFoundException) when calling the StartJobRun
operation: Release |
The release specified in job submission is incorrect. |
See Amazon EMR on EKS releases. |
An error occurred (AccessDeniedException) when calling the StartJobRun
operation: User: An error occurred (AccessDeniedException) when calling the StartJobRun
operation: User: |
User is not authorized to call StartJobRun. | See Using job execution roles with Amazon EMR on EKS. |
An error occurred (ValidationException) when calling the StartJobRun operation: configurationOverrides.monitoringConfiguration.s3MonitoringConfiguration.logUri failed to satisfy constraint : %s |
S3 path URI syntax is not valid. |
logUri should be in the format of s3://... |
The following errors may occur when you run DescribeJobRun
API before the
job runs.
Error Message | Error Condition | Recommended Next Step |
---|---|---|
stateDetails: JobRun submission failed. Classification failureReason: VALIDATION_ERROR state: FAILED. |
Parameters in StartJobRun are not valid. | See Amazon EMR on EKS releases. |
stateDetails: Cluster failureReason: CLUSTER_UNAVAILABLE state: FAILED |
The EKS cluster is not available. | Check if the EKS cluster exists and has the right permissions. For more information, see Setting up Amazon EMR on EKS. |
stateDetails: Cluster failureReason: CLUSTER_UNAVAILABLE state: FAILED |
Amazon EMR does not have permissions to access the EKS cluster. |
Verify that permissions are set up for Amazon EMR on the registered namespace. For more information, see Setting up Amazon EMR on EKS. |
stateDetails: Cluster failureReason: CLUSTER_UNAVAILABLE state: FAILED |
EKS cluster is not reachable. |
Check if EKS Cluster exists and has the right permissions. For more information, see Setting up Amazon EMR on EKS. |
stateDetails: JobRun submission failed due to an internal error. failureReason: INTERNAL_ERROR state: FAILED |
An internal error has occurred with the EKS cluster. |
N/A |
stateDetails: Cluster failureReason: USER_ERROR state: FAILED |
There are insufficient resources in the EKS cluster to run the job. |
Add more capacity to the EKS node group or set up EKS Autoscaler. For more information, see Cluster Autoscaler. |
The following errors may occur when you run DescribeJobRun
API after the
job runs.
Error Message | Error Condition | Recommended Next Step |
---|---|---|
stateDetails: Trouble monitoring your JobRun. Cluster failureReason: CLUSTER_UNAVAILABLE state: FAILED |
The EKS cluster does not exist. | Check if EKS Cluster exists and has the right permissions. For more information, see Setting up Amazon EMR on EKS. |
stateDetails: Trouble monitoring your JobRun. Cluster failureReason: CLUSTER_UNAVAILABLE state: FAILED |
Amazon EMR does not have permissions to access the EKS cluster. | Verify that permissions are set up for Amazon EMR on the registered namespace. For more information, see Setting up Amazon EMR on EKS. |
stateDetails: Trouble monitoring your JobRun. Cluster failureReason: CLUSTER_UNAVAILABLE state: FAILED |
The EKS cluster is not reachable. |
Check if EKS Cluster exists and has the right permissions. For more information, see Setting up Amazon EMR on EKS. |
stateDetails: Trouble monitoring your JobRun due to an internal error failureReason: INTERNAL_ERROR state: FAILED |
An internal error has occurred and is preventing JobRun monitoring. |
N/A |
The following error may occur when a job cannot start and the job waits in the SUBMITTED state for 15 minutes. This can be caused by a lack of cluster resources.
Error Message | Error Condition | Recommended Next Step |
---|---|---|
cluster timeout |
The job has been in the SUBMITTED state for 15 minutes or more. | You can override the default setting of 15 minutes for this parameter with the configuration override shown below. |
Use the following configuration to change the cluster timeout setting to 30 minutes.
Notice that you provide the new job-start-timeout
value in seconds:
{ "configurationOverrides": { "applicationConfiguration": [{ "classification": "emr-containers-defaults", "properties": { "job-start-timeout":"1800" } }] }