# Interact with and configure an EMR Serverless application
<a name="applications"></a>

This section covers how to interact with your Amazon EMR Serverless application with the AWS CLI. It also describes configuration of an application, performing customizations, and defaults for Spark and Hive engines.

**Topics**
+ [Application states](#application-states)
+ [Creating an EMR Serverless application from the EMR Studio console](studio.md)
+ [Interacting with your EMR Serverless application on the AWS CLI](applications-cli.md)
+ [Configuring an application when working with EMR Serverless](application-capacity.md)
+ [Customizing an EMR Serverless image](application-custom-image.md)
+ [Configuring VPC access for EMR Serverless applications to connect to data](vpc-access.md)
+ [Amazon EMR Serverless architecture options](architecture.md)
+ [Job concurrency and queuing for an EMR Serverless application](applications-concurrency-queuing.md)

## Application states
<a name="application-states"></a>

When you create an application with EMR Serverless, the application run enters the `CREATING` state. It then passes through the following states until it succeeds (exits with code `0`) or fails (exits with a non-zero code). 

Applications can have the following states:


****  

| State | Description | 
| --- | --- | 
| Creating | The application is being prepared and isn't ready to use yet. | 
| Created | The application has been created but hasn't provisioned capacity yet. You can modify the application to change its initial capacity configuration. | 
| Starting | The application is starting and is provisioning capacity. | 
| Started | The application is ready to accept new jobs. The application only accepts jobs when it's in this state. | 
| Stopping | All jobs have completed and the application is releasing its capacity.  | 
| Stopped | The application is stopped and no resources are running on the application. You can modify the application to change its initial capacity configuration. | 
| Terminated | The application has been terminated and doesn't appear on your application list.  | 

The following diagram illustrates the trajectory of EMR Serverless application states.

![\[EMR Serverless application states.\]](http://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/images/emr-serverless-application-states.png)


# Creating an EMR Serverless application from the EMR Studio console
<a name="studio"></a>

From the EMR Studio console, create, access, and manage EMR Serverless applications. To navigate to the EMR Studio console, follow the instructions in [Getting started from the console](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/getting-started.html#gs-console). 

## Create an application
<a name="studio-create-app"></a>

With the **Create application** page, create an EMR Serverless application by following these steps.

1. In the **Name** field, enter the name you want to call your application.

1. In the **Type** field, choose Spark or Hive as the type of the application.

1. In the **Release version** field, choose the EMR release number.

1. In the **Architecture** options, choose the instruction set architecture to use. For more information, refer to [Amazon EMR Serverless architecture options](architecture.md).
   + **arm64** — 64-bit ARM architecture; to use Graviton processors
   + **x86\$164** — 64-bit x86 architecture; to use x86-based processors

1. There are two application setup options for the remaining fields: default settings and custom settings. These fields are optional.

   **Default settings** — Default settings allow you to create an application quickly with pre-initialized capacity. This includes one driver and one executor for Spark, and one driver and one Tez Task for Hive. The default settings don't enable network connectivity to your VPCs. The application is configured to stop if idle for 15 minutes, and auto-starts on job submission.

   **Custom settings** — Custom settings allow you to modify the following properties.
   + **Pre-initialized capacity** — The number of drivers and executors or Hive Tez Task workers, and the size of each worker.
   + **Application limits** — The maximum capacity of an application.
   + **Application behavior** — The application's auto-start and auto-stop behavior.
   + **Network connections** — Network connectivity to VPC resources.
   + **Tags** — Custom tags that assign to the application.

   For more information about pre-initialized capacity, application limits, and application behavior, refer to [Configuring an application when working with EMR Serverless](application-capacity.md). For more information about network connectivity, refer to [Configuring VPC access for EMR Serverless applications to connect to data](vpc-access.md).

1. To create the application, choose **Create application** .

# List applications from the EMR Studio console
<a name="studio-list-app"></a>

You can access all existing EMR Serverless applications from the **List applications** page. You can choose an application’s name to navigate to the **Details** page for that application.

# Manage applications from the EMR Studio console
<a name="studio-manage-app"></a>

You can perform the following actions on an application from either the **List applications** page or from a specific application’s **Details** page.

****Start application****  
Choose this option to manually start an application.

****Stop application****  
Choose this option to manually stop an application. An application must have no running jobs to be stopped. To learn more about application state transitions, refer to [Application states](applications.md#application-states).

****Configure application****  
Edit the optional settings for an application from the **Configure application** page. You can change most application settings. For example, change the release label for an application to upgrade it to a different version of Amazon EMR, or switch the architecture from x86\$164 to arm64. The other optional settings are the same as those that are in the **Custom settings** section on the **Create application** page. For more information about the application settings, refer to [Create an application](studio.md#studio-create-app).

****Delete application****  
Choose this option to manually delete an application. You must stop an application to delete it. To learn more about application state transitions, refer to [Application states](applications.md#application-states).

# Interacting with your EMR Serverless application on the AWS CLI
<a name="applications-cli"></a>

From the AWS CLI, create, describe, and delete individual applications. You can also list all of your applications so that access them at a glance. This section describes how to perform these actions. For more application operations, such as starting, stopping, and application updates, refer to the [EMR Serverless API Reference](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/Welcome.html). For examples of how to use the EMR Serverless API using the AWS SDK for Java, refer to [Java examples](https://github.com/aws-samples/emr-serverless-samples/tree/main/examples/java-api) in our GitHub repository. For examples of how to use the EMR Serverless API using the AWS SDK for Python (Boto), refer to [Python examples](https://github.com/aws-samples/emr-serverless-samples/tree/main/examples/python-api) in our GitHub repository.

To create an application, use `create-application`. You must specify `SPARK` or `HIVE` as the application `type`. This command returns the application’s ARN, name, and ID.

```
aws emr-serverless create-application \
--name my-application-name \
--type 'application-type' \
--release-label release-version
```

To describe an application, use `get-application` and provide its `application-id`. This command returns the state and capacity-related configurations for your application.

```
aws emr-serverless get-application \
--application-id application-id
```

To list all of your applications, call `list-applications`. This command returns the same properties as `get-application` but includes all of your applications.

```
aws emr-serverless list-applications
```

To delete your application, call `delete-application` and supply your `application-id`.

```
aws emr-serverless delete-application \
--application-id application-id
```

# Configuring an application when working with EMR Serverless
<a name="application-capacity"></a>

With EMR Serverless, configure the applications that you use. For example, set the maximum capacity that an application can scale up to, configure pre-initialized capacity to keep driver and workers ready to respond, and specify a common set of runtime and monitoring configurations at the application level. The following pages describe how to configure applications when you use EMR Serverless.

**Topics**
+ [Understanding application behavior in EMR Serverless](app-behavior.md)
+ [Pre-initialized capacity for working with an application in EMR Serverless](pre-init-capacity.md)
+ [Default application configuration for EMR Serverless](default-configs.md)

# Understanding application behavior in EMR Serverless
<a name="app-behavior"></a>

This section describes job submission behavior, capacity configuration for scaling, and worker configuration settings for EMR Serverless.

## Default application behavior
<a name="auto-start-stop"></a>

**Auto-start** — An application by default is configured to auto-start on job submission. You can turn this feature off.

**Auto-stop** — An application by default is configured to auto-stop when idle for 15 minutes. When an application changes to the `STOPPED` state, it releases any configured pre-initialized capacity. You can modify the amount of idle time before an application auto-stops, or you can turn this feature off.

## Maximum capacity
<a name="max-capacity"></a>

You can configure the maximum capacity that an application can scale up to. You can specify your maximum capacity in terms of CPU, memory (GB), and disk (GB). 

**Note**  
It is best practice to configure your maximum capacity to be proportional to your supported worker sizes by multiplying the number of workers by their sizes. For example, if you want to limit your application to 50 workers with 2 vCPUs, 16 GB for memory, and 20 GB for disk, set your maximum capacity to 100 vCPUs, 800 GB for memory, and 1000 GB for disk. 

## Supported worker configurations
<a name="worker-configs"></a>

The following table lists supported worker configurations and sizes that can be specified for EMR Serverless. Configure different sizes for drivers and executors based on the need of your workload.


**Worker configurations and sizes**  

| CPU | Memory | Default ephemeral storage | 
| --- | --- | --- | 
|  1 vCPU  |  Minimum 2 GB, maximum 8 GB, in 1 GB increments  |  20 GB - 200 GB  | 
|  2 vCPU  |  Minimum 4 GB, maximum 16 GB, in 1 GB increments  |  20 GB - 200 GB  | 
|  4 vCPU  |  Minimum 8 GB, maximum 30 GB, in 1 GB increments  |  20 GB - 200 GB  | 
|  8 vCPU  |  Minimum 16 GB, maximum 60 GB, in 4 GB increments  |  20 GB - 200 GB  | 
|  16 vCPU  |  Minimum 32 GB, maximum 120 GB, in 8 GB increments  |  20 GB - 200 GB  | 

**CPU** — Each worker can have 1, 2, 4, 8, or 16 vCPUs.

**Memory** — Each worker has memory, specified in GB, within the limits listed in the earlier table. Spark jobs have a memory overhead, meaning that the memory they use is more than the specified container sizes. This overhead is specified with the properties `spark.driver.memoryOverhead` and `spark.executor.memoryOverhead`. The overhead has a default value of 10% of container memory, with a minimum of 384 MB. You should consider this overhead when you choose worker sizes. 

For example, If you choose 4 vCPUs for your worker instance, and a pre-initialized storage capacity of 30 GB, then set a value of approximately 27 GB as executor memory for your Spark job. This maximizes the utilization of your pre-initialized capacity. Usable memory is 27 GB, plus 10% of 27 GB (2.7 GB), for a total of 29.7 GB.

**Disk** — You can configure each worker with temporary storage disks with a minimum size of 20 GB and a maximum of 200 GB. You only pay for additional storage beyond 20 GB that you configure per worker.

# Pre-initialized capacity for working with an application in EMR Serverless
<a name="pre-init-capacity"></a>

EMR Serverless provides an optional feature that keeps driver and workers pre-initialized and ready to respond in seconds. This effectively creates a warm pool of workers for an application. This feature is called *pre-initialized capacity*. To configure this feature, set the `initialCapacity` parameter of an application to the number of workers you want to pre-initialize. With pre-initialized worker capacity, jobs start immediately. This is ideal when you want to implement iterative applications and time-sensitive jobs.

Pre-initialized capacity keeps a warm pool of workers ready for jobs and sessions to startup in seconds. You will be paying for provisioned pre-initialized workers even when the application is idle, hence we suggest enabling it for use cases that benefit from the fast start-up time and sizing it for optimal utilization of resources. EMR Serverless applications automatically shut down when idle. We suggest keeping this feature on when using pre-initialized workers to avoid unexpected charges.

When you submit a job, if workers from `initialCapacity` are available, the job uses those resources to start its run. If those workers are already in use by other jobs, or if the job needs more resources than available from `initialCapacity`, then the application requests and gets additional workers, up to the maximum limits on resources set for the application. When a job finishes its run, it releases the workers that it used, and the number of resources available for the application returns to `initialCapacity`. An application maintains the `initialCapacity` of resources even after jobs finish their runs. The application releases excess resources beyond `initialCapacity` when the jobs no longer need them to run.

Pre-initialized capacity is available and ready to use when the application has started. The pre-initialized capacity becomes inactive when the application is stopped. An application moves to the `STARTED` state only if the requested pre-initialized capacity has been created and is ready to use. The whole time that the application is in the `STARTED` state, EMR Serverless keeps the pre-initialized capacity available for use or in use by jobs or interactive workloads. The feature restores capacity for released or failed containers. This maintains the number of workers that the `InitialCapacity` parameter specifies. The state of an application with no pre-initialized capacity can immediately change from `CREATED` to `STARTED`.

 You can configure the application to release pre-initialized capacity if it isn't used for a certain period of time, with a default of 15 minutes. A stopped application starts automatically when you submit a new job. You can set these automatic start and stop configurations when you create the application, or change them when the application is in a `CREATED` or `STOPPED` state.

You can change the `InitialCapacity` counts, and specify compute configurations such as CPU, memory, and disk, for each worker. Because you can't make partial modifications, specify all compute configurations when you change values. You can only change configurations when the application is in the `CREATED` or `STOPPED` state.

**Note**  
To optimize your application’s use of resources, we suggest aligning your container sizes with your pre-initialized capacity worker sizes. For example, if you configure your Spark executor size to 2 CPUs and your memory to 8 GB, but your pre-initialized capacity worker size is 4 CPUs with 16 GB of memory, then the Spark executors only use half of the workers’ resources when they are assigned to this job.

## Customizing pre-initialized capacity for Spark and Hive
<a name="customizing-capacity"></a>

You can further customize pre-initialized capacity for workloads that run on specific big data frameworks. For example, when a workload runs on Apache Spark, specify how many workers start as drivers and how many start as executors. Similarly, when you use Apache Hive, specify how many workers start as Hive drivers, and how many should run Tez tasks.

**Configuring an application running Apache Hive with pre-initialized capacity**

The following API request creates an application running Apache Hive based on Amazon EMR release emr-6.6.0. The application starts with 5 pre-initialized Hive drivers, each with 2 vCPU and 4 GB of memory, and 50 pre-initialized Tez task workers, each with 4 vCPU and 8 GB of memory. When Hive queries run on this application, they first use the pre-initialized workers and start executing immediately. If all of the pre-initialized workers are busy and more Hive jobs are submitted, the application can scale to a total of 400 vCPU and 1024 GB of memory. You can optionally omit capacity for either the `DRIVER` or the `TEZ_TASK` worker.

```
aws emr-serverless create-application \
  --type "HIVE" \
  --name my-application-name \
  --release-label emr-6.6.0 \
  --initial-capacity '{
    "DRIVER": {
        "workerCount": 5,
        "workerConfiguration": {
            "cpu": "2vCPU",
            "memory": "4GB"
        }
    },
    "TEZ_TASK": {
        "workerCount": 50,
        "workerConfiguration": {
            "cpu": "4vCPU",
            "memory": "8GB"
        }
    }
  }' \
  --maximum-capacity '{
    "cpu": "400vCPU",
    "memory": "1024GB"
  }'
```

**Configuring an application running Apache Spark with pre-initialized capacity **

The following API request creates an application that runs Apache Spark 3.2.0 based on Amazon EMR release 6.6.0. The application starts with 5 pre-initialized Spark drivers, each with 2 vCPU and 4 GB of memory, and 50 pre-initialized executors, each with 4 vCPU and 8 GB of memory. When Spark jobs run on this application, they first use the pre-initialized workers and start to execute immediately. If all of the pre-initialized workers are busy and more Spark jobs are submitted, the application can scale to a total of 400 vCPU and 1024 GB of memory. You can optionally omit capacity for either the `DRIVER` or the `EXECUTOR`.

**Note**  
Spark adds a configurable memory overhead, with a 10% default value, to the memory requested for driver and executors. For jobs to use pre-initialized workers, the initial capacity memory configuration should be greater than the memory that the job and the overhead request.

```
aws emr-serverless create-application \
  --type "SPARK" \
  --name my-application-name \
  --release-label emr-6.6.0 \
  --initial-capacity '{
    "DRIVER": {
        "workerCount": 5,
        "workerConfiguration": {
            "cpu": "2vCPU",
            "memory": "4GB"
        }
    },
    "EXECUTOR": {
        "workerCount": 50,
        "workerConfiguration": {
            "cpu": "4vCPU",
            "memory": "8GB"
        }
    }
  }' \
  --maximum-capacity '{
    "cpu": "400vCPU",
    "memory": "1024GB"
  }'
```

# Default application configuration for EMR Serverless
<a name="default-configs"></a>

You can specify a common set of runtime and monitoring configurations at the application level for all the jobs that you submit under the same application. This reduces the additional overhead that is associated with the need to submit the same configurations for each job.

You can modify the configurations at the following points in time:
+ [Declare application-level configurations at job submission.](#default-configs-declare)
+ [Override default configurations during job run.](#default-configs-override)

The following sections provide more details and an example for further context.

## Declaring configurations at the application level
<a name="default-configs-declare"></a>

You can specify application-level logging and runtime configuration properties for the jobs that you submit under the application.

**`monitoringConfiguration`**  
To specify the log configurations for jobs that you submit with the application, use the [https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_MonitoringConfiguration.html](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_MonitoringConfiguration.html) field. For more information on logging for EMR Serverless, refer to [Storing logs](logging.md).

**`runtimeConfiguration`**  
To specify runtime configuration properties such as `spark-defaults`, provide a configuration object in the `runtimeConfiguration` field. This affects the default configurations for all the jobs that you submit with the application. For more information, refer to [Hive configuration override parameter](jobs-hive.md#hive-defaults-configurationOverrides) and [Spark configuration override parameter](jobs-spark.md#spark-defaults-configurationOverrides).  
Available configuration classifications vary by specific EMR Serverless release. For example, classifications for custom Log4j `spark-driver-log4j2` and `spark-executor-log4j2` are only available with releases 6.8.0 and higher. For a list of application-specific properties, refer to [Spark job properties](jobs-spark.md#spark-defaults) and [Hive job properties](jobs-hive.md#hive-defaults).  
You can also configure [Apache Log4j2 properties](log4j2.md), [AWS Secrets Manager for data protection](secrets-manager.md), and [Java 17 runtime](using-java-runtime.md) at the application level.  
To pass Secrets Manager secrets at the application level, attach the following policy to users and roles that need to create or update EMR Serverless applications with secrets.    
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "SecretsManagerPolicy",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret"
      ],
      "Resource": [
        "arn:aws:secretsmanager:us-east-1:123456789012:secret:my-secret-name-123abc"
      ]
    },
    {
      "Sid": "KMSDecryptPolicy",
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt"
      ],
      "Resource": [
        "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
      ]
    }
  ]
}
```
For more information on creating custom policies for secrets, refer to [Permissions policy examples for AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/auth-and-access_examples.html) in the *AWS Secrets Manager User Guide*.

**Note**  
The `runtimeConfiguration` that you specify at application level maps to `applicationConfiguration` in the [https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html) API.

### Example declaration
<a name="default-configs-declare-example"></a>

The following example shows how to declare default configurations with `create-application`.

```
aws emr-serverless create-application \
    --release-label release-version  \
    --type SPARK \
    --name my-application-name \
    --runtime-configuration '[
        {
            "classification": "spark-defaults",
            "properties": {
                "spark.driver.cores": "4",
                "spark.executor.cores": "2",
                "spark.driver.memory": "8G",
                "spark.executor.memory": "8G",
                "spark.executor.instances": "2",
                "spark.hadoop.javax.jdo.option.ConnectionDriverName":"org.mariadb.jdbc.Driver",
                "spark.hadoop.javax.jdo.option.ConnectionURL":"jdbc:mysql://db-host:db-port/db-name",
                "spark.hadoop.javax.jdo.option.ConnectionUserName":"connection-user-name",
                "spark.hadoop.javax.jdo.option.ConnectionPassword": "EMR.secret@SecretID"
            }
        },
        {
            "classification": "spark-driver-log4j2",
            "properties": {
                "rootLogger.level":"error", 
                "logger.IdentifierForClass.name": "classpathForSettingLogger",
                "logger.IdentifierForClass.level": "info"
            }
        }
    ]' \
    --monitoring-configuration '{
        "s3MonitoringConfiguration": {
            "logUri": "s3://amzn-s3-demo-logging-bucket/logs/app-level"
        },
        "managedPersistenceMonitoringConfiguration": {
            "enabled": false
        }
    }'
```

## Overriding configurations during a job run
<a name="default-configs-override"></a>

You can specify configuration overrides for the application configuration and monitoring configuration with the [https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html) API. EMR Serverless then merges the configurations that you specify at the application level and the job level to determine the configurations for the job execution. 

The granularity level when the merge occurs is as follows:
+ **[https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_ConfigurationOverrides.html#emrserverless-Type-ConfigurationOverrides-applicationConfiguration](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_ConfigurationOverrides.html#emrserverless-Type-ConfigurationOverrides-applicationConfiguration)** - Classification type, for example `spark-defaults`.
+ **[https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_ConfigurationOverrides.html#emrserverless-Type-ConfigurationOverrides-monitoringConfiguration](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_ConfigurationOverrides.html#emrserverless-Type-ConfigurationOverrides-monitoringConfiguration)** - Configuration type, for example `s3MonitoringConfiguration`.

**Note**  
The priority of configurations that you provide at [https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html) supersede the configurations that you provide at the application level.

For more information priority rankings, refer to [Hive configuration override parameter](jobs-hive.md#hive-defaults-configurationOverrides) and [Spark configuration override parameter](jobs-spark.md#spark-defaults-configurationOverrides).

When you start a job, if you don’t specify a particular configuration, it will be inherited from the application. If you declare the configurations at job level, you can perform the following operations:
+ **Override an existing configuration** - Provide the same configuration parameter in the `StartJobRun` request with your override values. 
+ **Add an additional configuration** - Add the new configuration parameter in the `StartJobRun` request with the values that you want to specify.
+ **Remove an existing configuration** - To remove an application *runtime configuration*, provide the key for the configuration that you want to remove, and pass an empty declaration `{}` for the configuration. We don't recommend removing any classifications that contain parameters that are required for a job run. For example, if you try to remove the [required properties for a Hive job](https://docs.aws.amazon.com/), the job will fail.

  To remove an application *monitoring configuration*, use the appropriate method for the relevant configuration type:
  + **`cloudWatchLoggingConfiguration`** - To remove `cloudWatchLogging`, pass the enabled flag as `false`. 
  + **`managedPersistenceMonitoringConfiguration`** - To remove managed persistence settings and fall back to the default enabled state, pass an empty declaration `{}` for the configuration. 
  + **`s3MonitoringConfiguration`** - To remove `s3MonitoringConfiguration`, pass an empty declaration `{}` for the configuration.

### Example override
<a name="default-configs-override-example"></a>

The following example shows different operations you can perform during job submission at `start-job-run`.

```
aws emr-serverless start-job-run \
    --application-id your-application-id \
    --execution-role-arn your-job-role-arn \
    --job-driver '{
        "sparkSubmit": {
            "entryPoint": "s3://us-east-1.elasticmapreduce/emr-containers/samples/wordcount/scripts/wordcount.py", 
            "entryPointArguments": ["s3://amzn-s3-demo-destination-bucket1/wordcount_output"]
        }
    }' \
    --configuration-overrides '{
        "applicationConfiguration": [ 
            {
                // Override existing configuration for spark-defaults in the application
                "classification": "spark-defaults", 
                "properties": {
                    "spark.driver.cores": "2",
                    "spark.executor.cores": "1",
                    "spark.driver.memory": "4G",
                    "spark.executor.memory": "4G"
                }
            },
            {
                // Add configuration for spark-executor-log4j2
                "classification": "spark-executor-log4j2",
                "properties": {
                    "rootLogger.level": "error", 
                    "logger.IdentifierForClass.name": "classpathForSettingLogger",
                    "logger.IdentifierForClass.level": "info"
                }
            },
            {
                // Remove existing configuration for spark-driver-log4j2 from the application
                "classification": "spark-driver-log4j2",
                "properties": {}
            }
        ],
        "monitoringConfiguration": {
            "managedPersistenceMonitoringConfiguration": {
                // Override existing configuration for managed persistence
                "enabled": true
            },
            "s3MonitoringConfiguration": {
                // Remove configuration of S3 monitoring
            },
            "cloudWatchLoggingConfiguration": {
                // Add configuration for CloudWatch logging
                "enabled": true
            }
        }
    }'
```

At the time of job execution, the following classifications and configurations will apply based on the priority override ranking described in [Hive configuration override parameter](jobs-hive.md#hive-defaults-configurationOverrides) and [Spark configuration override parameter](jobs-spark.md#spark-defaults-configurationOverrides).
+ The classification `spark-defaults` will be updated with the properties specified at the job level. Only the properties included in `StartJobRun` is considered for this classification.
+ The classification `spark-executor-log4j2` will be added in the existing list of classifications.
+ The classification `spark-driver-log4j2` will be removed.
+ The configurations for `managedPersistenceMonitoringConfiguration` will be updated with configurations at job level.
+ The configurations for `s3MonitoringConfiguration` will be removed.
+ The configurations for `cloudWatchLoggingConfiguration` will be added to existing monitoring configurations.

# Customizing an EMR Serverless image
<a name="application-custom-image"></a>

Starting with Amazon EMR 6.9.0, use custom images to package application dependencies and runtime environments into a single container with Amazon EMR Serverless. This simplifies how you manage workload dependencies and makes your packages more portable. When you customize your EMR Serverless image, it provides the following benefits:
+ Installs and configures packages that are optimized to your workloads. These packages are not widely available in the public distribution of Amazon EMR runtime environments.
+ Integrates EMR Serverless with current established build, test, and deployment processes within your organization, including local development and testing.
+ Applies established security processes, such as image scanning, that meet compliance and governance requirements within your organization.
+ Lets you use your own versions of JDK and Python for your applications.

EMR Serverless provides images that use as your base when you create your own images. The base image provides the essential jars, configuration, and libraries for the image to interact with EMR Serverless. You can find the base image in the [Amazon ECR Public Gallery](https://gallery.ecr.aws/emr-serverless/). Use the image that matches your application type (Spark or Hive) and release version. For example, if you create an application on Amazon EMR release 6.9.0, use the following images.


| Type | Image | 
| --- | --- | 
|  Spark  |  `public.ecr.aws/emr-serverless/spark/emr-6.9.0:latest`  | 
|  Hive  |  `public.ecr.aws/emr-serverless/hive/emr-6.9.0:latest`  | 

## Prerequisites
<a name="worker-configs"></a>

Before you create an EMR Serverless custom image, complete these prerequisites.

1. Create an Amazon ECR repository in the same AWS Region that you use to launch EMR Serverless applications. To create an Amazon ECR private repository, refer to [Creating a private repository](https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-create.html).

1. To grant users access to your Amazon ECR repository, add the following policies to users and roles that create or update EMR Serverless applications with images from this repository. 

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Sid": "ECRRepositoryListGetPolicy",
         "Effect": "Allow",
         "Action": [
           "ecr:GetDownloadUrlForLayer",
           "ecr:BatchGetImage",
           "ecr:DescribeImages"
         ],
         "Resource": [
           "arn:aws:ecr:*:123456789012:repository/my-repo"
         ]
       }
     ]
   }
   ```

------

   For more examples of Amazon ECR identity-based policies, refer to [Amazon Elastic Container Registry identity-based policy examples](https://docs.aws.amazon.com/AmazonECR/latest/userguide/security_iam_id-based-policy-examples.html).

## Step 1: Create a custom image from EMR Serverless base images
<a name="create-image"></a>

First, create a [Dockerfile](https://docs.docker.com/engine/reference/builder/) that begins with a `FROM` instruction that uses your preferred base image. After the `FROM` instruction, include any modification that you want to make to the image. The base image automatically sets the `USER` to `hadoop`. This setting does not have permissions for all the modifications you include. As a workaround, set the `USER` to `root`, modify your image, and then set the `USER` back to `hadoop:hadoop`. To refer to samples for common use cases, refer to [Using custom images with EMR Serverless](using-custom-images.md).

```
# Dockerfile
FROM public.ecr.aws/emr-serverless/spark/emr-6.9.0:latest

USER root
# MODIFICATIONS GO HERE

# EMRS runs the image as hadoop
USER hadoop:hadoop
```

After you have the Dockerfile, build the image with the following command.

```
# build the docker image
docker build . -t aws-account-id.dkr.ecr.region.amazonaws.com/my-repository[:tag]or[@digest]
```

## Step 2: Validate image locally
<a name="validate"></a>

EMR Serverless provides an offline tool that can statically check your custom image to validate basic files, environment variables, and correct image configurations. For information on how to install and run the tool, refer to [the Amazon EMR Serverless Image CLI GitHub](https://github.com/awslabs/amazon-emr-serverless-image-cli).

After you install the tool, run the following command to validate an image:

```
amazon-emr-serverless-image \
validate-image -r emr-6.9.0 -t spark \
-i aws-account-id.dkr.ecr.region.amazonaws.com/my-repository:tag/@digest
```

The output appears similar to the following.

```
Amazon EMR Serverless - Image CLI
Version: 0.0.1
... Checking if docker cli is installed
... Checking Image Manifest
[INFO] Image ID: 9e2f4359cf5beb466a8a2ed047ab61c9d37786c555655fc122272758f761b41a
[INFO] Created On: 2022-12-02T07:46:42.586249984Z
[INFO] Default User Set to hadoop:hadoop : PASS
[INFO] Working Directory Set to  : PASS
[INFO] Entrypoint Set to /usr/bin/entrypoint.sh : PASS
[INFO] HADOOP_HOME is set with value: /usr/lib/hadoop : PASS
[INFO] HADOOP_LIBEXEC_DIR is set with value: /usr/lib/hadoop/libexec : PASS
[INFO] HADOOP_USER_HOME is set with value: /home/hadoop : PASS
[INFO] HADOOP_YARN_HOME is set with value: /usr/lib/hadoop-yarn : PASS
[INFO] HIVE_HOME is set with value: /usr/lib/hive : PASS
[INFO] JAVA_HOME is set with value: /etc/alternatives/jre : PASS
[INFO] TEZ_HOME is set with value: /usr/lib/tez : PASS
[INFO] YARN_HOME is set with value: /usr/lib/hadoop-yarn : PASS
[INFO] File Structure Test for hadoop-files in /usr/lib/hadoop: PASS
[INFO] File Structure Test for hadoop-jars in /usr/lib/hadoop/lib: PASS
[INFO] File Structure Test for hadoop-yarn-jars in /usr/lib/hadoop-yarn: PASS
[INFO] File Structure Test for hive-bin-files in /usr/bin: PASS
[INFO] File Structure Test for hive-jars in /usr/lib/hive/lib: PASS
[INFO] File Structure Test for java-bin in /etc/alternatives/jre/bin: PASS
[INFO] File Structure Test for tez-jars in /usr/lib/tez: PASS
-----------------------------------------------------------------
Overall Custom Image Validation Succeeded.
-----------------------------------------------------------------
```

## Step 3: Upload the image to your Amazon ECR repository
<a name="upload-image"></a>

Push your Amazon ECR image to your Amazon ECR repository with the following commands. Ensure you have the correct IAM permissions to push the image to your repository. For more information, refer to [Pushing an image](https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-push.html) in the *Amazon ECR User Guide*.

```
# login to ECR repo
aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws-account-id.dkr.ecr.region.amazonaws.com

# push the docker image
docker push aws-account-id.dkr.ecr.region.amazonaws.com/my-repository:tag/@digest
```

## Step 4: Create or update an application with custom images
<a name="create-app"></a>

Choose the AWS Management Console tab or AWS CLI tab according to how you want to launch your application, then complete the following steps.

------
#### [ Console ]

1. Sign in to the EMR Studio console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr). Navigate to your application, or create a new application with the instructions in [Create an application](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/studio.html#studio-create-app).

1. To specify custom images when you create or update an EMR Serverless application, select **Custom settings** in the application setup options.

1. In the **Custom image settings** section, select the **Use the custom image with this application** check box.

1. Paste the Amazon ECR image URI into the **Image URI** field. EMR Serverless uses this image for all worker types for the application. Alternatively, you can choose **Different custom images** and paste different Amazon ECR image URIs for each worker type.

------
#### [ CLI ]
+ Create an application with the `image-configuration` parameter. EMR Serverless applies this setting to all worker types.

  ```
  aws emr-serverless create-application \
  --release-label emr-6.9.0 \
  --type SPARK \
  --image-configuration '{
      "imageUri": "aws-account-id.dkr.ecr.region.amazonaws.com/my-repository:tag/@digest"
  }'
  ```

  To create an application with different image settings for each worker type, use the `worker-type-specifications` parameter.

  ```
  aws emr-serverless create-application \
  --release-label emr-6.9.0 \
  --type SPARK \
  --worker-type-specifications '{
      "Driver": {
          "imageConfiguration": {
              "imageUri": "aws-account-id.dkr.ecr.region.amazonaws.com/my-repository:tag/@digest"
          }
      },
      "Executor" : {
          "imageConfiguration": {
              "imageUri": "aws-account-id.dkr.ecr.region.amazonaws.com/my-repository:tag/@digest"
          }
      }
  }'
  ```

  To update an application, use the `image-configuration` parameter. EMR Serverless applies this setting to all worker types.

  ```
  aws emr-serverless update-application \
  --application-id application-id \
  --image-configuration '{
      "imageUri": "aws-account-id.dkr.ecr.region.amazonaws.com/my-repository:tag/@digest"
  }'
  ```

------

## Step 5: Allow EMR Serverless to access the custom image repository
<a name="access-repo"></a>

Add the following resource policy to the Amazon ECR repository to allow the EMR Serverless service principal to use the `get`, `describe`, and `download` requests from this repository.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "EmrServerlessCustomImageSupport",
      "Effect": "Allow",
      "Action": [
        "ecr:BatchGetImage",
        "ecr:DescribeImages",
        "ecr:GetDownloadUrlForLayer"
      ],
      "Resource": "arn:aws:ecr:*:123456789012:repository/my-repo",
      "Condition": {
        "ArnLike": {
          "aws:SourceArn": "arn:aws:emr-serverless:*:123456789012:/applications/*"
        }
      }
    }
  ]
}
```

------

As a security best practice, add an `aws:SourceArn` condition key to the repository policy. The IAM global condition key `aws:SourceArn` ensures that EMR Serverless uses the repository only for an application ARN. For more information on Amazon ECR repository policies, refer to [Creating a private repository](https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-policies.html).

## Considerations and limitations
<a name="considerations"></a>

When you work with custom images, consider the following:
+ Use the correct base image that matches the type (Spark or Hive) and release label (for example, `emr-6.9.0`) for your application.
+ EMR Serverless ignores `[CMD]` or `[ENTRYPOINT]` instructions in the Docker file. Use common instructions in the Docker file, such as `[COPY]`, `[RUN]`, and `[WORKDIR]`.
+ Do not modify environment variables `JAVA_HOME`, `SPARK_HOME`, `HIVE_HOME`, `TEZ_HOME` when you create a custom image.
+ Custom images can't exceed 10 GB in size.
+ If you modify binaries or jars in the Amazon EMR base images, this can cause application or job launch failures.
+ The Amazon ECR repository must be in the same AWS Region that you use to launch EMR Serverless applications.

# Configuring VPC access for EMR Serverless applications to connect to data
<a name="vpc-access"></a>

You can configure EMR Serverless applications to connect to your data stores within your VPC, such as Amazon Redshift clusters, Amazon RDS databases or Amazon S3 buckets with VPC endpoints. Your EMR Serverless application has outbound connectivity to the data stores within your VPC. By default, EMR Serverless blocks both inbound access to your applications and outbound internet access to enhance security.

**Note**  
You must configure VPC access if you want to use an external Hive metastore database for your application. For information about how to configure an external Hive metastore, refer to [Metastore configuration](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/metastore-config.html).

## Create application
<a name="vpc-create-app"></a>

On the **Create application** page, choose custom settings and specify the VPC, subnets and security groups that EMR Serverless applications can use.

### VPCs
<a name="vpc-create-vpc"></a>

Choose the name of the virtual private cloud (VPC) that contains your data stores. The **Create application** page lists all VPCs for your chosen AWS Region.

### Subnets
<a name="vpc-create-subnet"></a>

Choose the subnets within the VPC that contains your data store. The **Create application** page lists all subnets for the data stores in your VPC. Both public and private subnets are supported. You can pass either private or public subnets to your applications. The choice of whether to have a public or private subnet has a few associated considerations to be aware of.

For private subnets:
+ The associated route tables must not have internet gateways.
+ For outbound connectivity to the internet, if needed, configure outbound routes using a NAT Gateway. To configure a NAT Gateway, refer to [NAT gateways](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html#nat-gateway-working-with).
+ For Amazon S3 connectivity, configure either a NAT Gateway or a VPC endpoint. To configure an S3 VPC endpoint, refer to [Create a gateway endpoint](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html#create-gateway-endpoint-s3).
+ If you configure an S3 VPC endpoint and you attach an endpoint policy to control access, follow the instructions in [Logging for EMR Serverless with managed storage](logging.html#jobs-log-storage-managed-storage) to provide permissions for EMR Serverless to store and serve application logs.
+ For connectivity to other AWS services outside the VPC, such as to Amazon DynamoDB, configure either VPC endpoints or a NAT gateway. To configure VPC endpoints for AWS services, refer to [Work with VPC endpoints](https://docs.aws.amazon.com/vpc/latest/privatelink/what-is-privatelink.html#working-with-privatelink).

**Note**  
When you set up an Amazon EMR Serverless application in a private subnet, we suggest that you also set up VPC endpoints for Amazon S3. If your EMR Serverless application is in a private subnet without VPC endpoints for Amazon S3, you incur additional NAT gateway charges that are associated with S3 traffic. This is because the traffic between your EMR application and Amazon S3 will not stay within your VPC when VPC endpoints aren't configured.

For public subnets:
+ These have a route to an Internet Gateway.
+ You must ensure proper security group configurations to control outbound traffic.

Workers can connect to the data stores within your VPC through outbound traffic. By default, EMR Serverless blocks inbound access to workers. This is to improve security.

When you use AWS Config, EMR Serverless creates an elastic network interface item record for every worker. To avoid costs related to this resource, consider turning off `AWS::EC2::NetworkInterface` in AWS Config.

**Note**  
We suggest that you select multiple subnets across multiple Availability Zones. This is because the subnets that you choose determine the Availability Zones available for an EMR Serverless application to launch. Each worker consumes an IP address on the subnet where it is launched. Please ensure that the specified subnets have sufficient IP addresses for the number of workers you plan to launch. For more information on subnet planning, refer to [Best practices for subnet planning](#subnet-best-practices).

#### Considerations and limitations for subnets
<a name="vpc-create-subnet-considerations"></a>
+ EMR Serverless with public subnets does not support AWS Lake Formation.
+ Inbound traffic isn't supported for public subnets.

### Security groups
<a name="vpc-create-sg"></a>

Choose one or more security groups that can communicate with your data stores. The **Create application** page lists all security groups in your VPC. EMR Serverless associates these security groups with elastic network interfaces that are attached to your VPC subnets.

**Note**  
We suggest that you create a separate security group for EMR Serverless applications. EMR Serverless does not allow you to **Create**/**Update**/**Start application** if security groups have ports open to the public internet on **0.0.0.0/0** or the **::/0** range. This provides enhanced security, isolation, and makes managing network rules more efficient. For example, this blocks unexpected traffic to workers with public IP addresses. To communicate with Amazon Redshift clusters, for instance, define the traffic rules between Redshift and EMR Serverless security groups, as demonstrated in the example in the following section.

**Example — Communication with Amazon Redshift clusters**  

1. Add a rule for inbound traffic to the Amazon Redshift security group from one of the EMR Serverless security groups.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/vpc-access.html)

1. Add a rule for outbound traffic from one of the EMR Serverless security groups. Do this in one of two ways. First, open outbound traffic to all ports.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/vpc-access.html)

   Alternatively, you can restrict outbound traffic to Amazon Redshift clusters. This is useful only when the application must communicate with Amazon Redshift clusters and nothing else.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/vpc-access.html)

## Configure application
<a name="vpc-configure-app"></a>

You can change the network configuration for an existing EMR Serverless application from the **Configure application** page.

### Access job run details
<a name="vpc-configure-access-details"></a>

On the **Job run detail** page, access the subnet used by your job for a specific run. Note that a job runs only in one subnet selected from the specified subnets.

## Best practices for subnet planning
<a name="subnet-best-practices"></a>

AWS resources are created in a subnet which is a subset of available IP addresses in an Amazon VPC. For example, a VPC with a /16 netmask has up to 65,536 available IP addresses which can be broken into multiple smaller networks using subnet masks. As an example, you can split this range into two subnets with each using /17 mask and 32,768 available IP addresses. A subnet resides within an Availability Zone and cannot span across zones.

The subnets should be designed keeping in mind your EMR Serverless application scaling limits. For example, if you have an application requesting 4 vCpu workers and can scale up to 4,000 vCpu, then your application requires at most 1,000 workers for a total of 1,000 network interfaces. We suggest that you create subnets across multiple Availability Zones. This allows EMR Serverless to retry your job or provision pre-initialized capacity in a different Availability Zone in an unlikely event when an Availability Zone fails. Therefore, each subnet in at least two Availability Zones should have more than 1,000 available IP addresses.

You need subnets with mask size lower than or equal to 22 to provision 1,000 network interfaces. Any mask greater than 22 does not meet the requirement. For example, a subnet mask of /23 provides 512 IP addresses, while a mask of /22 provides 1024 and a mask of /21 provides 2048 IP addresses. Below is an example of 4 subnets with /22 mask in a VPC of /16 netmask that can be allocated to different Availability Zones. There is a difference of five between available and usable IP addresses because first four IP addresses and last IP address in each subnet is reserved by AWS.


| Subnet ID | Subnet Address | Subnet Mask | IP Address Range | Available IP Addresses | Usable IP Addresses | 
| --- | --- | --- | --- | --- | --- | 
|  1  |  10.0.0.0  |  255.255.252.0/22  |  10.0.0.0 - 10.0.3.255  |  1,024  |  1,019  | 
|  2  |  10.0.4.0  |  255.255.252.0/22  |  10.0.4.0 - 10.0.7.255  |  1,024  |  1,019  | 
|  3  |  10.0.8.0  |  255.255.252.0/22  |  10.0.8.0 - 10.0.11.255  |  1,024  |  1,019  | 
|  4  |  10.0.12.0  |  255.255.252.0/22  |  10.0.12.0 - 10.0.15.255  |  1,024  |  1,019  | 

You should evaluate if your workload is best suited for larger worker sizes. Using larger worker sizes requires fewer network interfaces. For example, using 16vCpu workers with an application scaling limit of 4,000 vCpu requires at most 250 workers for a total of 250 available IP addresses to provision network interfaces. You need subnets in multiple Availability Zones with mask size lower than or equal to 24 to provision 250 network interfaces. Any mask size greater than 24 offers less than 250 IP addresses. 

If you share subnets across multiple applications, each subnet should be designed keeping in mind collective scaling limits of all your applications. For example, if you have 3 applications requesting 4 vCpu workers and each can scale up to 4000 vCpu with 12,000 vCpu account-level service based quota, each subnet requires 3000 available IP addresses. If the VPC that you want to use doesn't have a sufficient number of IP addresses, try to increase the number of available IP addresses. You can do this by associating additional Classless Inter-Domain Routing (CIDR) blocks with your VPC. For more information, refer to [Associate additional IPv4 CIDR blocks with your VPC](https://docs.aws.amazon.com/vpc/latest/userguide/working-with-vpcs.html#add-ipv4-cidr) in the *Amazon VPC User Guide*.

You can use one of the many tools available online to quickly generate subnet definitions and review their available range of IP addresses.

# Amazon EMR Serverless architecture options
<a name="architecture"></a>

The instruction set architecture of your Amazon EMR Serverless application determines the type of processors that the application uses to run the job. Amazon EMR provides two architecture options for your application: **x86\$164** and **arm64**. EMR Serverless automatically updates to the latest generation of instances as they become available, so your applications can use the newer instances without requiring additional effort from you.

**Topics**
+ [Using x86\$164 architecture](#x86)
+ [Using arm64 architecture (Graviton)](#arm64)
+ [Launching new applications with Graviton support](#arm64-new)
+ [Configuring existing applications to use Graviton](#arm64-existing)
+ [Considerations when using Graviton](#arm64-considerations)

## Using x86\$164 architecture
<a name="x86"></a>

The **x86\$164** architecture is also known as x86 64-bit or x64. **x86\$164** is the default option for EMR Serverless applications. This architecture uses x86-based processors and is compatible with most third-party tools and libraries.

Most applications are compatible with the x86 hardware platform and can run successfully on the default **x86\$164** architecture. However, if your application is compatible with 64-bit ARM, then switch to **arm64** to use Graviton processors for improved performance, compute power, and memory. It costs less to run instances on arm64 architecture than when you run instances of equal size on x86 architecture. 

## Using arm64 architecture (Graviton)
<a name="arm64"></a>

AWS Graviton processors are custom designed by AWS with 64-bit ARM Neoverse cores and leverage the arm64 architecture (also known as Arch64 or 64-bit ARM). The AWS Graviton line of processors available on EMR Serverless include Graviton3 and Graviton2 processors. These processors deliver superior price-performance for Spark and Hive workloads compared to equivalent workloads that run on the x86\$164 architecture. EMR Serverless automatically uses the latest generation of processors when available without any effort from your side to upgrade to the latest generation of processors.

## Launching new applications with Graviton support
<a name="arm64-new"></a>

Use one of the following methods to launch an application that uses the **arm64** architecture.

------
#### [ AWS CLI ]

To launch an application using Graviton processors from AWS CLI, specify `ARM64` as the `architecture` parameter in the `create-application` API. Provide the appropriate values for your application in the other parameters.

```
aws emr-serverless create-application \
 --name my-graviton-app \
 --release-label emr-6.8.0 \
 --type "SPARK" \
 --architecture "ARM64" \
 --region us-west-2
```

------
#### [ EMR Studio ]

To launch an application using Graviton processors from EMR Studio, choose **arm64** as the **Architecture** option when you create or update an application.

------

## Configuring existing applications to use Graviton
<a name="arm64-existing"></a>

You can configure your existing Amazon EMR Serverless applications to use the Graviton (arm64) architecture with the SDK, AWS CLI, or EMR Studio.

**To convert an existing application from x86 to arm64**

1. Confirm that you are using the latest major version of the [AWS CLI/SDK](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/emr-serverless/index.html#cli-aws-emr-serverless) that supports the `architecture` parameter.

1. Confirm that there are no jobs running and then stop the application.

   ```
   aws emr-serverless stop-application \
    --application-id application-id \
    --region us-west-2
   ```

1.  To update the application to use Graviton, specify `ARM64` for the `architecture` parameter in the `update-application` API.

   ```
   aws emr-serverless update-application \
    --application-id application-id \
    --architecture 'ARM64' \
    --region us-west-2
   ```

1. To verify that the CPU architecture of the application is now ARM64, use the `get-application` API.

   ```
   aws emr-serverless get-application \
    --application-id application-id \
    --region us-west-2
   ```

1. When you're ready, restart the application.

   ```
   aws emr-serverless start-application \
    --application-id application-id \
    --region us-west-2
   ```

## Considerations when using Graviton
<a name="arm64-considerations"></a>

Before you launch an EMR Serverless application using arm64 for Graviton support, confirm the following.

### Library compatibility
<a name="arm64-prereqs-library"></a>

When you select Graviton (arm64) as an architecture option, ensure that third-party packages and libraries are compatible with the 64-bit ARM architecture. For information on how to package Python libraries into a Python virtual environment that is compatible with your selected architecture, refer to [Using Python libraries with EMR Serverless](using-python-libraries.md).

To learn more, refer to the [AWS Graviton Getting Started](https://github.com/aws/aws-graviton-getting-started) repository on GitHub. This repository contains essential resources that can help you get started with the ARM-based Graviton.

# Job concurrency and queuing for an EMR Serverless application
<a name="applications-concurrency-queuing"></a>

Starting with Amazon EMR version 7.0.0 and later, specify job run queue timeout and concurrency configuration for your application. When you specify this configuration, Amazon EMR Serverless starts by queuing your job and begins execution based on concurrency utilization on your application. For example, if your job run concurrency is 10, only ten jobs are run at a time on your application. Remaining jobs are queued until one of the running jobs terminates. If queue timeout is reached earlier, your job times out. For more information, refer to [Job run states](job-states.html).

## Key benefits of concurrency and queuing
<a name="applications-concurrency-key-benefits"></a>

Job concurrency and queuing provides the following benefits when many job submissions are required:
+ It helps control concurrent executing jobs to efficiently use your application level capacity limits.
+ The queue can contain a sudden burst of job submissions, with a configurable timeout setting.

## Getting started with concurrency and queuing
<a name="applications-concurrency-getting-started"></a>

The following procedures demonstrate a couple different ways to implement concurrency and queuing.

**Using the AWS CLI**

1. Create an Amazon EMR Serverless application with queue timeout and concurrent job runs:

   ```
   aws emr-serverless create-application \
   --release-label emr-7.0.0 \
   --type SPARK \
   --scheduler-configuration '{"maxConcurrentRuns": 1, "queueTimeoutMinutes": 30}'
   ```

1. Update an application to change the job queue timeout and concurrency:

   ```
   aws emr-serverless update-application \
   --application-id application-id \
   --scheduler-configuration '{"maxConcurrentRuns": 5, "queueTimeoutMinutes": 30}'
   ```
**Note**  
You can update your existing application to enable job concurrency and queuing. To do this, the application must have a release label *emr-7.0.0* or later.

**Using the AWS Management Console**

The following steps demonstrate how to get started with job concurrency and queuing, using the AWS Management Console: 

1. Go to EMR Studio and choose to create an application with release label EMR-7.0.0 or higher.

1. Under **Application setup options**, select the option **Use custom settings**.

1. Under **Additional configurations** there is a section for **Job Run Settings**. Select the option **Enable job concurrency** to enable the feature.

1. After selection, select **Concurrent job runs** and **Queue timeout** to configure the number of concurrent job runs and queue timeout, respectively. If you do not enter values for these settings, the default values are used.

1. Choose **Create Application** and the application will be created with this feature enabled. To verify, go to the dashboard, select your application and check under properties tab to determine if the feature is enabled.

Following configuration, submit jobs with this feature enabled.

## Considerations for concurrency and queuing
<a name="applications-concurrency-considerations"></a>

Take the following into consideration when you implement concurrency and queuing:
+ Job concurrency and queuing is supported on Amazon EMR release 7.0.0 and higher.
+ Job concurrency and queuing is enabled by default on Amazon EMR release 7.3.0 and higher.
+ You cannot update concurrency for an application in the **STARTED** state.
+ The valid range for `maxConcurrentRuns` is 1 to 1000, and for `queueTimeoutMinutes` it is 15 to 720.
+ A maximum of 2000 jobs can be in the **QUEUED** state for an account.
+ Concurrency and queuing applies to batch and streaming jobs. It cannot be used for interactive jobs. For more information, refer to [Run interactive workloads with EMR Serverless through EMR Studio](interactive-workloads.html).