Default application configuration for EMR Serverless
You can specify a common set of runtime and monitoring configurations at the application level for all the jobs that you submit under the same application. This reduces the additional overhead that is associated with the need to submit the same configurations for each job.
You can modify the configurations at the following points in time:
The following sections provide more details and an example for further context.
Declaring configurations at the application level
You can specify application-level logging and runtime configuration properties for the jobs that you submit under the application.
monitoringConfiguration
-
To specify the log configurations for jobs that you submit with the application, use the
monitoringConfiguration
field. For more information on logging for EMR Serverless, see Storing logs. runtimeConfiguration
-
To specify runtime configuration properties such as
spark-defaults
, provide a configuration object in theruntimeConfiguration
field. This affects the default configurations for all the jobs that you submit with the application. For more information, see Hive configuration override parameter and Spark configuration override parameter.Available configuration classifications vary by specific EMR Serverless release. For example, classifications for custom Log4j
spark-driver-log4j2
andspark-executor-log4j2
are only available with releases 6.8.0 and higher. For a list of application-specific properties, see Spark job properties and Hive job properties.You can also configure Apache Log4j2 properties, AWS Secrets Manager for data protection, and Java 17 runtime at the application level.
To pass Secrets Manager secrets at the application level, attach the following policy to users and roles that need to create or update EMR Serverless applications with secrets.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "
SecretsManagerPolicy
", "Effect": "Allow", "Action": [ "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "kms:Decrypt" ], "Resource": "arn:aws:secretsmanager:your-secret-arn
" } ] }For more information on creating custom policies for secrets, see Permissions policy examples for AWS Secrets Manager in the AWS Secrets Manager User Guide.
Note
The runtimeConfiguration
that you specify at application level maps to
applicationConfiguration
in the StartJobRun
API.
Example declaration
The following example shows how to declare default configurations with
create-application
.
aws emr-serverless create-application \ --release-label
release-version
\ --type SPARK \ --namemy-application-name
\ --runtime-configuration '[ { "classification": "spark-defaults", "properties": { "spark.driver.cores": "4", "spark.executor.cores": "2", "spark.driver.memory": "8G", "spark.executor.memory": "8G", "spark.executor.instances": "2", "spark.hadoop.javax.jdo.option.ConnectionDriverName":"org.mariadb.jdbc.Driver", "spark.hadoop.javax.jdo.option.ConnectionURL":"jdbc:mysql://db-host
:db-port
/db-name
", "spark.hadoop.javax.jdo.option.ConnectionUserName":"connection-user-name
", "spark.hadoop.javax.jdo.option.ConnectionPassword": "EMR.secret@SecretID
" } }, { "classification": "spark-driver-log4j2", "properties": { "rootLogger.level":"error", "logger.IdentifierForClass.name": "classpathForSettingLogger
", "logger.IdentifierForClass.level": "info" } } ]' \ --monitoring-configuration '{ "s3MonitoringConfiguration": { "logUri": "s3://amzn-s3-demo-logging-bucket
/logs/app-level" }, "managedPersistenceMonitoringConfiguration": { "enabled": false } }'
Overriding configurations during a job run
You can specify configuration overrides for the application configuration and monitoring
configuration with the StartJobRun
API. EMR Serverless then merges the configurations that
you specify at the application level and the job level to determine the configurations for
the job execution.
The granularity level when the merge occurs is as follows:
-
ApplicationConfiguration
- Classification type, for examplespark-defaults
. -
MonitoringConfiguration
- Configuration type, for examples3MonitoringConfiguration
.
Note
The priority of configurations that you provide at StartJobRun
supersede the configurations that you provide at the
application level.
For more information priority rankings, see Hive configuration override parameter and Spark configuration override parameter.
When you start a job, if you don’t specify a particular configuration, it will be inherited from the application. If you declare the configurations at job level, you can perform the following operations:
-
Override an existing configuration - Provide the same configuration parameter in the
StartJobRun
request with your override values. -
Add an additional configuration - Add the new configuration parameter in the
StartJobRun
request with the values that you want to specify. -
Remove an existing configuration - To remove an application runtime configuration, provide the key for the configuration that you want to remove, and pass an empty declaration
{}
for the configuration. We don't recommend removing any classifications that contain parameters that are required for a job run. For example, if you try to remove the required properties for a Hive job, the job will fail.To remove an application monitoring configuration, use the appropriate method for the relevant configuration type:
-
cloudWatchLoggingConfiguration
- To removecloudWatchLogging
, pass the enabled flag asfalse
. -
managedPersistenceMonitoringConfiguration
- To remove managed persistence settings and fall back to the default enabled state, pass an empty declaration{}
for the configuration. -
s3MonitoringConfiguration
- To removes3MonitoringConfiguration
, pass an empty declaration{}
for the configuration.
-
Example override
The following example shows different operations you can perform during job submission
at start-job-run
.
aws emr-serverless start-job-run \ --application-id
your-application-id
\ --execution-role-arnyour-job-role-arn
\ --job-driver '{ "sparkSubmit": { "entryPoint": "s3://us-east-1
.elasticmapreduce/emr-containers/samples/wordcount/scripts/wordcount.py", "entryPointArguments": ["s3://amzn-s3-demo-destination-bucket1
/wordcount_output"] } }' \ --configuration-overrides '{ "applicationConfiguration": [ { // Override existing configuration for spark-defaults in the application "classification": "spark-defaults", "properties": { "spark.driver.cores": "2", "spark.executor.cores": "1", "spark.driver.memory": "4G", "spark.executor.memory": "4G" } }, { // Add configuration for spark-executor-log4j2 "classification": "spark-executor-log4j2", "properties": { "rootLogger.level": "error", "logger.IdentifierForClass.name": "classpathForSettingLogger
", "logger.IdentifierForClass.level": "info" } }, { // Remove existing configuration for spark-driver-log4j2 from the application "classification": "spark-driver-log4j2", "properties": {} } ], "monitoringConfiguration": { "managedPersistenceMonitoringConfiguration": { // Override existing configuration for managed persistence "enabled": true }, "s3MonitoringConfiguration": { // Remove configuration of S3 monitoring }, "cloudWatchLoggingConfiguration": { // Add configuration for CloudWatch logging "enabled": true } } }'
At the time of job execution, the following classifications and configurations will apply based on the priority override ranking described in Hive configuration override parameter and Spark configuration override parameter.
-
The classification
spark-defaults
will be updated with the properties specified at the job level. Only the properties included inStartJobRun
would be considered for this classification. -
The classification
spark-executor-log4j2
will be added in the existing list of classifications. -
The classification
spark-driver-log4j2
will be removed. -
The configurations for
managedPersistenceMonitoringConfiguration
will be updated with configurations at job level. -
The configurations for
s3MonitoringConfiguration
will be removed. -
The configurations for
cloudWatchLoggingConfiguration
will be added to existing monitoring configurations.