AddJobFlowSteps adds new steps to a running job flow. A maximum of 256 steps are allowed in
each job flow.
If your job flow is long-running (such as a Hive data warehouse) or complex, you may require
more than 256 steps to process your data. You can bypass the 256-step limitation in various
ways, including using the SSH shell to connect to the master node and submitting queries
directly to the software running on the master node, such as Hive and Hadoop. For more
information on how to do this, go to Add
More than 256 Steps to a Job Flow in the Amazon Elastic MapReduce Developer’s
Guide.
A step specifies the location of a JAR file stored either on the master node of the job flow or
in Amazon S3. Each step is performed by the main function of the main class of the JAR file.
The main class can be specified either in the manifest of the JAR or by using the MainFunction
parameter of the step.
Elastic MapReduce executes each step in the order listed. For a step to be considered complete,
the main function must exit with a zero exit code and all Hadoop jobs started while the step
was running must have completed and run successfully.
You can only add steps to a job flow that is in one of the following states: STARTING,
BOOTSTRAPPING, RUNNING, or WAITING.
Access
Parameters
Parameter |
Type |
Required |
Description |
$job_flow_id
|
string
|
Required
|
A string that uniquely identifies the job flow. This identifier is returned by RunJobFlow and can also be obtained from DescribeJobFlows . [Constraints: The value must be between 0 and 256 characters, and must match the following regular expression pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]* ] |
$steps
|
array
|
Required
|
A list of StepConfig to be executed by the job flow.
x - array - Optional - This represents a simple array index. Name - string - Required - The name of the job flow step. [Constraints: The value must be between 0 and 256 characters, and must match the following regular expression pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]* ]ActionOnFailure - string - Optional - Specifies the action to take if the job flow step fails. [Allowed values: TERMINATE_JOB_FLOW , CANCEL_AND_WAIT , CONTINUE ]HadoopJarStep - array - Required - Specifies the JAR file used for the job flow step. x - array - Optional - This represents a simple array index. Properties - array - Optional - A list of Java properties that are set when the step runs. You can use these properties to pass key value pairs to your main function. x - array - Optional - This represents a simple array index. Key - string - Optional - The unique identifier of a key value pair. [Constraints: The value must be between 0 and 10280 characters, and must match the following regular expression pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]* ]Value - string - Optional - The value part of the identified key. [Constraints: The value must be between 0 and 10280 characters, and must match the following regular expression pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]* ]
Jar - string - Required - A path to a JAR file run during the step. [Constraints: The value must be between 0 and 10280 characters, and must match the following regular expression pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]* ]MainClass - string - Optional - The name of the main class in the specified Java file. If not specified, the JAR file should specify a Main-Class in its manifest file. [Constraints: The value must be between 0 and 10280 characters, and must match the following regular expression pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]* ]Args - string|array - Optional - A list of command line arguments passed to the JAR file’s main function when executed. Pass a string for a single value, or an indexed array for multiple values.
|
$opt
|
array
|
Optional
|
An associative array of parameters that can have the following keys:
curlopts - array - Optional - A set of values to pass directly into curl_setopt() , where the key is a pre-defined CURLOPT_* constant.returnCurlHandle - boolean - Optional - A private toggle specifying that the cURL handle be returned rather than actually completing the request. This toggle is useful for manually managed batch requests. |
Returns
Examples
Add steps to an existing job flow.
$emr = new AmazonEMR();
$emr->set_region(AmazonEMR::REGION_IRELAND);
CFHadoopStep::$region = 'eu-west-1'; // Tell the Hadoop helpers to use the correct region.
$response = $emr->add_job_flow_steps('j-2PL8AAY8YJ06P', array(
new CFStepConfig(array(
'Name' => 'Enable Debugging',
'ActionOnFailure' => 'CONTINUE',
'HadoopJarStep' => CFHadoopStep::enable_debugging()
)),
new CFStepConfig(array(
'Name' => 'Install Hive',
'ActionOnFailure' => 'CONTINUE',
'HadoopJarStep' => CFHadoopStep::install_hive()
)),
new CFStepConfig(array(
'Name' => 'Install Pig',
'ActionOnFailure' => 'CONTINUE',
'HadoopJarStep' => CFHadoopStep::install_pig()
))
));
// Success?
var_dump($response->isOK());
Result:
bool(true)
Related Methods
Source
Method defined in services/emr.class.php | Toggle source view (12 lines) | View on GitHub
public function add_job_flow_steps($job_flow_id, $steps, $opt = null)
{
if (!$opt) $opt = array();
$opt['JobFlowId'] = $job_flow_id;
// Required list + map
$opt = array_merge($opt, CFComplexType::map(array(
'Steps' => (is_array($steps) ? $steps : array($steps))
), 'member'));
return $this->authenticate('AddJobFlowSteps', $opt);
}