Job concurrency and queuing - Amazon EMR

Job concurrency and queuing

Starting with Amazon EMR version 7.0.0 and later, you can specify job run queue timeout and concurrency configuration for your application. When you specify this configuration, Amazon EMR Serverless starts by queuing your job and begins execution based on concurrency utilization on your application. For example, if your job run concurrency is 10, only ten jobs are run at a time on your application. Remaining jobs are queued until one of the running jobs terminates. If queue timeout is reached earlier, your job times out. For more information, see Job run states.

Key benefits of concurrency and queuing

Job concurrency and queuing provides the following benefits when many job submissions are required:

  • It helps control concurrent executing jobs to efficiently use your application level capacity limits.

  • The queue can contain a sudden burst of job submissions, with a configurable timeout setting.

Getting started with concurrency and queuing

The following procedures show a couple different ways to implement concurrency and queuing.

Using the AWS CLI

  1. Create an Amazon EMR Serverless application with queue timeout and concurrent job runs:

    aws emr-serverless create-application \ --release-label emr-7.0.0 \ --type SPARK \ --scheduler-configuration '{"maxConcurrentRuns": 1, "queueTimeoutMinutes": 30}'
  2. Update an application to change the job queue timeout and concurrency:

    aws emr-serverless update-application \ --application-id application-id \ --scheduler-configuration '{"maxConcurrentRuns": 5, "queueTimeoutMinutes": 30}'
    Note

    You can update your existing application to enable job concurrency and queuing. To do this, the application must have a release label emr-7.0.0 or later.

Using the AWS Management Console

The following steps show you how to get started with job concurrency and queuing, using the AWS Management Console:

  1. Go to EMR Studio and choose to create an application with release label EMR-7.0.0 or higher.

  2. Under Application setup options, select the option Use custom settings.

  3. Under Additional configurations there is a section for Job Run Settings. Select the option Enable job concurrency to enable the feature.

  4. Once selected, you can select both Concurrent job runs and Queue timeout to configure the number of concurrent job runs and queue timeout, respectively. If you do not enter values for these settings, the default values are used.

  5. Choose Create Application and the application will be created with this feature enabled. To verify, go to the dashboard, select your application and check under properties tab to determine if the feature is enabled.

Following configuration, you can submit jobs with this feature enabled.

Considerations for concurrency and queuing

Take the following into consideration when you implement concurrency and queuing:

  • Job queue and concurrency is supported on Amazon EMR release 7.0.0 and later.

  • You can update concurrency for an application in the STARTED state.

  • The valid range for maxConcurrentRuns is 1 to 1000, and for queueTimeoutMinutes it is 15 to 720.

  • A maximum of 2000 jobs can be in the QUEUED state for an account.

  • Concurrency and queuing applies to batch and streaming jobs. It cannot be used for interactive jobs. For more information, see Run interactive workloads with EMR Serverless through EMR Studio.