Getting started with Amazon EMR Serverless
This tutorial helps you get started with EMR Serverless when you deploy a sample Spark or
Hive workload. You'll create, run, and debug your own application. We show default options in
most parts of this tutorial.
Before you launch an EMR Serverless application, complete the following tasks.
Grant permissions to use EMR Serverless
To use EMR Serverless, you need a user or IAM role with an attached policy that
grants permissions for EMR Serverless. To create a user and attach the appropriate policy
to that user, follow the instructions in Grant permissions.
Prepare storage for EMR Serverless
In this tutorial, you'll use an S3 bucket to store output files and logs from the sample
Spark or Hive workload that you'll run using an EMR Serverless application. To create a
bucket, follow the instructions in Creating a bucket in the
Amazon Simple Storage Service Console User Guide. Replace any further reference to
amzn-s3-demo-bucket
with the name of the newly
created bucket.
Create an EMR Studio to run interactive
workloads
If you want to use EMR Serverless to execute interactive queries through notebooks that
are hosted in EMR Studio, you need to specify an S3 bucket and the minimum service role for EMR Serverless to create a Workspace. For steps to get
set up, see Set up an EMR Studio
in the Amazon EMR Management Guide. For more information on interactive workloads,
see Run interactive workloads with EMR Serverless through
EMR Studio.
Create a job runtime role
Job runs in EMR Serverless use a runtime role that provides granular permissions to
specific AWS services and resources at runtime. In this tutorial, a public S3 bucket hosts
the data and scripts. The bucket amzn-s3-demo-bucket
stores the output.
To set up a job runtime role, first create a runtime role with a trust policy so that
EMR Serverless can use the new role. Next, attach the required S3 access policy to that
role. The following steps guide you through the process.
- Console
-
-
Navigate to the IAM console at https://console.aws.amazon.com/iam/.
-
In the left navigation pane, choose Roles.
-
Choose Create role.
-
For role type, choose Custom trust policy and paste the
following trust policy. This allows jobs submitted to your Amazon EMR Serverless
applications to access other AWS services on your behalf.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "emr-serverless.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
-
Choose Next to navigate to the Add
permissions page, then choose Create
policy.
-
The Create policy page opens on a new tab. Paste the
policy JSON below.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ReadAccessForEMRSamples",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::*.elasticmapreduce",
"arn:aws:s3:::*.elasticmapreduce/*"
]
},
{
"Sid": "FullAccessToOutputBucket",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::amzn-s3-demo-bucket
",
"arn:aws:s3:::amzn-s3-demo-bucket
/*"
]
},
{
"Sid": "GlueCreateAndReadDataCatalog",
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:CreateDatabase",
"glue:GetDataBases",
"glue:CreateTable",
"glue:GetTable",
"glue:UpdateTable",
"glue:DeleteTable",
"glue:GetTables",
"glue:GetPartition",
"glue:GetPartitions",
"glue:CreatePartition",
"glue:BatchCreatePartition",
"glue:GetUserDefinedFunctions"
],
"Resource": ["*"]
}
]
}
-
On the Review policy page, enter a name for your policy,
such as EMRServerlessS3AndGlueAccessPolicy
.
-
Refresh the Attach permissions policy page, and choose
EMRServerlessS3AndGlueAccessPolicy
.
-
In the Name, review, and create page, for Role
name, enter a name for your role, for example,
EMRServerlessS3RuntimeRole
. To create this IAM role, choose
Create role.
- CLI
-
-
Create a file named emr-serverless-trust-policy.json
that
contains the trust policy to use for the IAM role. The file should contain the
following policy.
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "EMRServerlessTrustPolicy",
"Action": "sts:AssumeRole",
"Effect": "Allow",
"Principal": {
"Service": "emr-serverless.amazonaws.com"
}
}]
}
-
Create an IAM role named EMRServerlessS3RuntimeRole
. Use the
trust policy that you created in the previous step.
aws iam create-role \
--role-name EMRServerlessS3RuntimeRole \
--assume-role-policy-document file://emr-serverless-trust-policy.json
Note the ARN in the output. You use the ARN of the new role during job
submission, referred to after this as the
job-role-arn
.
-
Create a file named emr-sample-access-policy.json
that defines
the IAM policy for your workload. This provides read access to the script and
data stored in public S3 buckets and read-write access to
amzn-s3-demo-bucket
.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ReadAccessForEMRSamples",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::*.elasticmapreduce",
"arn:aws:s3:::*.elasticmapreduce/*"
]
},
{
"Sid": "FullAccessToOutputBucket",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::amzn-s3-demo-bucket
",
"arn:aws:s3:::amzn-s3-demo-bucket
/*"
]
},
{
"Sid": "GlueCreateAndReadDataCatalog",
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:CreateDatabase",
"glue:GetDataBases",
"glue:CreateTable",
"glue:GetTable",Understanding default application behavior, including auto-start and auto-stop, as well as maximum capacity and worker configurations for configuring an application with &EMRServerless;.
"glue:UpdateTable",
"glue:DeleteTable",
"glue:GetTables",
"glue:GetPartition",
"glue:GetPartitions",
"glue:CreatePartition",
"glue:BatchCreatePartition",
"glue:GetUserDefinedFunctions"
],
"Resource": ["*"]
}
]
}
-
Create an IAM policy named EMRServerlessS3AndGlueAccessPolicy
with the policy file that you created in Step 3. Take note of
the ARN in the output, as you will use the ARN of the new policy in the next step.
aws iam create-policy \
--policy-name EMRServerlessS3AndGlueAccessPolicy \
--policy-document file://emr-sample-access-policy.json
Note the new policy's ARN in the output. You'll substitute it for
policy-arn
in the next step.
-
Attach the IAM policy EMRServerlessS3AndGlueAccessPolicy
to the
job runtime role EMRServerlessS3RuntimeRole
.
aws iam attach-role-policy \
--role-name EMRServerlessS3RuntimeRole \
--policy-arn policy-arn