Install additional software on your Amazon EMR cluster - AWS Data Pipeline

AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. Learn more

Install additional software on your Amazon EMR cluster

EmrCluster provides the supportedProducts field that installs third-party software on an Amazon EMR cluster, for example, it lets you install a custom distribution of Hadoop, such as MapR. It accepts a comma-separated list of arguments for the third-party software to read and act on. The following example shows how to use the supportedProducts field of EmrCluster to create a custom MapR M3 edition cluster with Karmasphere Analytics installed, and run an EmrActivity object on it.

{ "id": "MyEmrActivity", "type": "EmrActivity", "schedule": {"ref": "ResourcePeriod"}, "runsOn": {"ref": "MyEmrCluster"}, "postStepCommand": "echo Ending job >> /mnt/var/log/stepCommand.txt", "preStepCommand": "echo Starting job > /mnt/var/log/stepCommand.txt", "step": "/home/hadoop/contrib/streaming/hadoop-streaming.jar,-input,s3n://elasticmapreduce/samples/wordcount/input,-output, \ hdfs:///output32113/,-mapper,s3n://elasticmapreduce/samples/wordcount/wordSplitter.py,-reducer,aggregate" }, { "id": "MyEmrCluster", "type": "EmrCluster", "schedule": {"ref": "ResourcePeriod"}, "supportedProducts": ["mapr,--edition,m3,--version,1.2,--key1,value1","karmasphere-enterprise-utility"], "masterInstanceType": "m3.xlarge", "taskInstanceType": "m3.xlarge" }