Connecting to Apache Oozie workflows with the AWS Schema Conversion Tool
You can use the AWS SCT command line interface (CLI) to convert Apache Oozie workflows to AWS Step Functions. After you migrate your Apache Hadoop workloads to Amazon EMR, you can use a native service in the AWS Cloud to orchestrate your jobs. For more information, see Connecting to Apache Hadoop.
AWS SCT converts your Oozie workflows to AWS Step Functions and uses AWS Lambda to emulate features that AWS Step Functions doesn't support. Also, AWS SCT converts your Oozie job properties to AWS Systems Manager.
To convert Apache Oozie workflows, make sure that you use AWS SCT version 1.0.671 or higher. Also, familiarize yourself with the command line interface of AWS SCT. For more information, see CLI Reference for AWS Schema Conversion Tool.
Prerequisites for using Apache Oozie as a source
The following prerequisites are required to connect to Apache Oozie with the AWS SCT CLI.
-
Create an Amazon S3 bucket to store the definitions of state machines. You can use these definitions to configure your state machines. For more information, see Creating a bucket in the Amazon S3 User Guide.
-
Create an AWS Identity and Access Management (IAM) role with the
AmazonS3FullAccess
policy. AWS SCT uses this IAM role to access your Amazon S3 bucket. -
Take a note of your AWS secret key and AWS secret access key. For more information about AWS access keys, see Managing access keys in the IAM User Guide.
-
Store your AWS credentials and the information about your Amazon S3 bucket in the AWS service profile in the global application settings. Then, AWS SCT uses this AWS service profile to work with your AWS resources. For more information, see Managing Profiles in the AWS Schema Conversion Tool.
To work with your source Apache Oozie workflows, AWS SCT requires the specific structure
of your source files. Each of your application folders must include the
job.properties
file. This file includes key-value pairs of your job
properties. Also, each of your application folders must include the workflow.xml
file. This file describes the action nodes and control flow nodes of your workflow.
Connecting to Apache Oozie as a source
Use the following procedure to connect to your Apache Oozie source files.
To connect to Apache Oozie in the AWS SCT CLI
-
Create a new AWS SCT CLI script or edit an existing scenario template. For example, you can download and edit the
OozieConversionTemplate.scts
template. For more information, see Getting CLI scenarios. -
Configure the AWS SCT application settings.
The following code example saves the application settings and allows to store passwords in your project. You can use these saved settings in other projects.
SetGlobalSettings -save: 'true' -settings: '{ "store_password": "true" }' /
-
Create a new AWS SCT project.
The following code example creates the
oozie
project in thec:\sct
folder.CreateProject -name: 'oozie' -directory: 'c:\sct' /
-
Add the folder with your source Apache Oozie files to the project using the
AddSource
command. Make sure that you use theAPACHE_OOZIE
value for thevendor
parameter. Also, provide values for the following required parameters:name
andmappingsFolder
.The following code example adds Apache Oozie as a source in your AWS SCT project. This example creates a source object with the name
OOZIE
. Use this object name to add mapping rules. After you run this code example, AWS SCT uses thec:\oozie
folder to load your source files in the project.AddSource -name: 'OOZIE' -vendor: 'APACHE_OOZIE' -mappingsFolder: 'c:\oozie' /
You can use this example and the following examples in Windows.
-
Connect to your source Apache Oozie files using the
ConnectSource
command. Use the name of your source object that you defined in the previous step.ConnectSource -name: 'OOZIE' -mappingsFolder: 'c:\oozie' /
-
Save your CLI script. Next, add the connection information for your AWS Step Functions service.
Permissions for using AWS Lambda functions in the extension pack
For the source functions that AWS Step Functions doesn't support, AWS SCT creates an extension pack. This extension pack includes AWS Lambda functions, which emulate your source functions.
To use this extension pack, create an AWS Identity and Access Management (IAM) role with the following permissions.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "lambda", "Effect": "Allow", "Action": [ "lambda:InvokeFunction" ], "Resource": [ "arn:aws:lambda:*:498160209112:function:LoadParameterInitialState:*", "arn:aws:lambda:*:498160209112:function:EvaluateJSPELExpressions:*" ] }, { "Sid": "emr", "Effect": "Allow", "Action": [ "elasticmapreduce:DescribeStep", "elasticmapreduce:AddJobFlowSteps" ], "Resource": [ "arn:aws:elasticmapreduce:*:498160209112:cluster/*" ] }, { "Sid": "s3", "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::*/*" ] } ] }
To apply the extension pack, AWS SCT requires an IAM role with the following permissions.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "iam:GetRole", "iam:ListRolePolicies", "iam:CreateRole", "iam:TagRole", "iam:PutRolePolicy", "iam:DeleteRolePolicy", "iam:DeleteRole", "iam:PassRole" ], "Resource": [ "arn:aws:iam::ACCOUNT_NUMBER:role/sct/*" ] }, { "Effect": "Allow", "Action": [ "iam:GetRole", "iam:ListRolePolicies" ], "Resource": [ "arn:aws:iam::ACCOUNT_NUMBER:role/lambda_LoadParameterInitialStateRole", "arn:aws:iam::ACCOUNT_NUMBER:role/lambda_EvaluateJSPELExpressionsRole", "arn:aws:iam::ACCOUNT_NUMBER:role/stepFunctions_MigratedOozieWorkflowRole" ] }, { "Effect": "Allow", "Action": [ "lambda:GetFunction", "lambda:CreateFunction", "lambda:UpdateFunctionCode", "lambda:DeleteFunction" ], "Resource": [ "arn:aws:lambda:*:ACCOUNT_NUMBER:function:LoadParameterInitialState", "arn:aws:lambda:*:ACCOUNT_NUMBER:function:EvaluateJSPELExpressions" ] } ] }
Connecting to AWS Step Functions as a target
Use the following procedure to connect to AWS Step Functions as a target.
To connect to AWS Step Functions in the AWS SCT CLI
-
Open your CLI script which includes the connection information for your Apache Oozie source files.
-
Add the information about your migration target in the AWS SCT project using the
AddTarget
command. Make sure that you use theSTEP_FUNCTIONS
value for thevendor
parameter. Also, provide values for the following required parameters:name
andprofile
.The following code example adds AWS Step Functions as a source in your AWS SCT project. This example creates a target object with the name
AWS_STEP_FUNCTIONS
. Use this object name when you create mapping rules. Also, this example uses an AWS SCT service profile that you created in the prerequisites step. Make sure that you replaceprofile_name
with the name of your profile.AddTarget -name: 'AWS_STEP_FUNCTIONS' -vendor: 'STEP_FUNCTIONS' -profile: '
profile_name
' /If you don't use the AWS service profile, make sure that you provide values for the following required parameters:
accessKey
,secretKey
,awsRegion
, ands3Path
. Use these parameters to specify your AWS secret access key, AWS secret key, AWS Region, and the path to your Amazon S3 bucket. -
Connect to AWS Step Functions using the
ConnectTarget
command. Use the name of your target object that you defined in the previous step.The following code example connects to the
AWS_STEP_FUNCTIONS
target object using your AWS service profile. Make sure that you replaceprofile_name
with the name of your profile.ConnectTarget -name: 'AWS_STEP_FUNCTIONS' -profile: '
profile_name
' / -
Save your CLI script. Next, add mapping rules and migration commands. For more information, see Converting Oozie workflows;.