AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. Learn more
EmrConfiguration
The EmrConfiguration object is the configuration used for EMR clusters with releases 4.0.0 or greater. Configurations (as a list) is a parameter to the RunJobFlow API call. The configuration API for Amazon EMR takes a classification and properties. AWS Data Pipeline uses EmrConfiguration with corresponding Property objects to configure an EmrCluster application such as Hadoop, Hive, Spark, or Pig on EMR clusters launched in a pipeline execution. Because configuration can only be changed for new clusters, you cannot provide a EmrConfiguration object for existing resources. For more information, see http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/.
Example
The following configuration object sets the
io.file.buffer.size
and
fs.s3.block.size
properties in
core-site.xml
:
[ { "classification":"core-site", "properties": { "io.file.buffer.size": "4096", "fs.s3.block.size": "67108864" } } ]
The corresponding pipeline object definition uses a EmrConfiguration object
and a list of Property objects in the property
field:
{ "objects": [ { "name": "ReleaseLabelCluster", "releaseLabel": "emr-4.1.0", "applications": ["spark", "hive", "pig"], "id": "ResourceId_I1mCc", "type": "EmrCluster", "configuration": { "ref": "coresite" } }, { "name": "coresite", "id": "coresite", "type": "EmrConfiguration", "classification": "core-site", "property": [{ "ref": "io-file-buffer-size" }, { "ref": "fs-s3-block-size" } ] }, { "name": "io-file-buffer-size", "id": "io-file-buffer-size", "type": "Property", "key": "io.file.buffer.size", "value": "4096" }, { "name": "fs-s3-block-size", "id": "fs-s3-block-size", "type": "Property", "key": "fs.s3.block.size", "value": "67108864" } ] }
The following example is a nested configuration used to set the Hadoop
environment with the hadoop-env
classification:
[ { "classification": "hadoop-env", "properties": {}, "configurations": [ { "classification": "export", "properties": { "YARN_PROXYSERVER_HEAPSIZE": "2396" } } ] } ]
The corresponding pipeline definition object that uses this configuration is below:
{ "objects": [ { "name": "ReleaseLabelCluster", "releaseLabel": "emr-4.0.0", "applications": ["spark", "hive", "pig"], "id": "ResourceId_I1mCc", "type": "EmrCluster", "configuration": { "ref": "hadoop-env" } }, { "name": "hadoop-env", "id": "hadoop-env", "type": "EmrConfiguration", "classification": "hadoop-env", "configuration": { "ref": "export" } }, { "name": "export", "id": "export", "type": "EmrConfiguration", "classification": "export", "property": { "ref": "yarn-proxyserver-heapsize" } }, { "name": "yarn-proxyserver-heapsize", "id": "yarn-proxyserver-heapsize", "type": "Property", "key": "YARN_PROXYSERVER_HEAPSIZE", "value": "2396" }, ] }
The following example modifies a Hive-specific property for an EMR cluster:
{ "objects": [ { "name": "hivesite", "id": "hivesite", "type": "EmrConfiguration", "classification": "hive-site", "property": [ { "ref": "hive-client-timeout" } ] }, { "name": "hive-client-timeout", "id": "hive-client-timeout", "type": "Property", "key": "hive.metastore.client.socket.timeout", "value": "2400s" } ] }
Syntax
This object includes the following fields.
Required Fields | Description | Slot Type |
---|---|---|
classification | Classification for the configuration. | String |
Optional Fields | Description | Slot Type |
---|---|---|
configuration | Sub-configuration for this configuration. | Reference Object, e.g. "configuration":{"ref":"myEmrConfigurationId"} |
parent | Parent of the current object from which slots will be inherited. | Reference Object, e.g. "parent":{"ref":"myBaseObjectId"} |
property | Configuration property. | Reference Object, e.g. "property":{"ref":"myPropertyId"} |
Runtime Fields | Description | Slot Type |
---|---|---|
@version | Pipeline version the object was created with. | String |
System Fields | Description | Slot Type |
---|---|---|
@error | Error describing the ill-formed object | String |
@pipelineId | Id of the pipeline to which this object belongs to | String |
@sphere | The sphere of an object denotes its place in the lifecycle: Component Objects give rise to Instance Objects which execute Attempt Objects | String |