AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. Learn more
Install additional software on your Amazon EMR cluster
EmrCluster
provides the supportedProducts
field
that installs third-party software on an Amazon EMR cluster, for example, it lets
you install a custom distribution of Hadoop, such as MapR. It accepts a
comma-separated list of arguments for the third-party software to read and
act on. The following example shows how to use the
supportedProducts
field of EmrCluster
to
create a custom MapR M3 edition cluster with Karmasphere Analytics
installed, and run an EmrActivity
object on it.
{ "id": "MyEmrActivity", "type": "EmrActivity", "schedule": {"ref": "ResourcePeriod"}, "runsOn": {"ref": "MyEmrCluster"}, "postStepCommand": "echo Ending job >> /mnt/var/log/stepCommand.txt", "preStepCommand": "echo Starting job > /mnt/var/log/stepCommand.txt", "step": "/home/hadoop/contrib/streaming/hadoop-streaming.jar,-input,s3n://elasticmapreduce/samples/wordcount/input,-output, \ hdfs:///output32113/,-mapper,s3n://elasticmapreduce/samples/wordcount/wordSplitter.py,-reducer,aggregate" }, { "id": "MyEmrCluster", "type": "EmrCluster", "schedule": {"ref": "ResourcePeriod"}, "supportedProducts": ["mapr,--edition,m3,--version,1.2,--key1,value1","karmasphere-enterprise-utility"], "masterInstanceType": "m3.xlarge", "taskInstanceType": "m3.xlarge" }