Adding steps to an Amazon EMR cluster with the AWS CLI
The following procedures demonstrate how to add steps to a newly created cluster and
to a running cluster with the AWS CLI. Both examples use the --steps
subcommand to add steps to the cluster.
To add steps during cluster creation
-
Type the following command to create a cluster and add an Apache Pig step. Make sure to replace
with the name of your Amazon EC2 key pair.myKey
aws emr create-cluster --name "
Test cluster
" \ --applications Name=Spark
\ --use-default-roles \ --ec2-attributes KeyName=myKey
\ --instance-groups InstanceGroupType=PRIMARY
,InstanceCount=1
,InstanceType=m5.xlarge
InstanceGroupType=CORE
,InstanceCount=2
,InstanceType=m5.xlarge
\ --steps '[{"Args":["spark-submit","--deploy-mode","cluster","--class","org.apache.spark.examples.SparkPi","/usr/lib/spark/examples/jars/spark-examples.jar","5"],"Type":"CUSTOM_JAR","ActionOnFailure":"CONTINUE","Jar":"command-runner.jar","Properties":"","Name":"Spark application"}]'Note
The list of arguments changes depending on the type of step.
By default, the step concurrency level is
1
. You can set the step concurrency level with theStepConcurrencyLevel
parameter when you create a cluster.The output is a cluster identifier similar to the following.
{ "ClusterId": "j-2AXXXXXXGAPLF" }
To add a step to a running cluster
-
Type the following command to add a step to a running cluster. Replace
with your own cluster ID.j-2AXXXXXXGAPLF
aws emr add-steps --cluster-id
j-2AXXXXXXGAPLF
\ --steps '[{"Args":["spark-submit","--deploy-mode","cluster","--class","org.apache.spark.examples.SparkPi","/usr/lib/spark/examples/jars/spark-examples.jar","5"],"Type":"CUSTOM_JAR","ActionOnFailure":"CONTINUE","Jar":"command-runner.jar","Properties":"","Name":"Spark application"}]'The output is a step identifier similar to the following.
{ "StepIds": [ "s-Y9XXXXXXAPMD" ] }
To modify the StepConcurrencyLevel in a running cluster
-
In a running cluster, you can modify the
StepConcurrencyLevel
with theModifyCluster
API. For example, type the following command to increase theStepConcurrencyLevel
to10
. Replace
with your cluster ID.j-2AXXXXXXGAPLF
aws emr modify-cluster --cluster-id
j-2AXXXXXXGAPLF
--step-concurrency-level 10 -
The output is similar to the following.
{ "StepConcurrencyLevel": 10 }
For more information on using Amazon EMR commands in the AWS CLI, see the AWS CLI Command Reference.