使用 Apache Spark 创建集群 - Amazon EMR

使用 Apache Spark 创建集群

以下程序在 Amazon EMR 控制台中使用快速选项 创建一个安装了 Spark 的集群。

作为替代，您可以使用 Advanced Options (高级选项) 进一步自定义您的集群设置，或是提交步骤以编程方式安装应用程序，然后执行自定义应用程序。利用这些集群创建选项之一，您可以选择使用 AWS Glue 作为您的 Spark SQL 元存储。请参阅在亚马逊 EMR 上使用 AWS Glue 数据目录和 Spark了解更多信息。

启动安装了 Spark 的集群

在 /emr 上打开亚马逊 EMR 控制台。https://console.aws.amazon.com
选择 Create cluster (创建集群) 以使用 Quick Options (快速选项)。
输入 Cluster name (集群名称)。集群名称不能包含字符 <、>、$、| 或 `（反引号）。
在 Software Configuration (软件配置) 中，选择 Release (发行版) 选项。
在 Applications (应用程序) 中，选择 Spark 应用程序捆绑包。
根据需要选择其它选项，然后选择 Create cluster (创建集群)。

注意
要在创建集群时配置 Spark，请参阅配置 Spark。

要启动安装了 Spark 的集群，请使用 AWS CLI

使用下面的命令创建集群。


aws emr create-cluster --name "Spark cluster" --release-label emr-7.8.0 --applications Name=Spark \
--ec2-attributes KeyName=myKey --instance-type m5.xlarge --instance-count 3 --use-default-roles

注意

为了便于读取，包含 Linux 行继续符（\）。它们可以通过 Linux 命令删除或使用。对于 Windows，请将它们删除或替换为脱字号（^）。

使用 SDK for Java 启动安装了 Spark 的集群

通过 SupportedProductConfig 中使用的 RunJobFlowRequest 指定 Spark 作为应用程序。

下面的实例显示如何通过 Java 使用 Spark 创建集群：



import com.amazonaws.AmazonClientException;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduce;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClientBuilder;
import com.amazonaws.services.elasticmapreduce.model.*;
import com.amazonaws.services.elasticmapreduce.util.StepFactory;

public class Main {

        public static void main(String[] args) {
                AWSCredentials credentials_profile = null;
                try {
                        credentials_profile = new ProfileCredentialsProvider("default").getCredentials();
                } catch (Exception e) {
                        throw new AmazonClientException(
                                        "Cannot load credentials from .aws/credentials file. " +
                                                        "Make sure that the credentials file exists and the profile name is specified within it.",
                                        e);
                }

                AmazonElasticMapReduce emr = AmazonElasticMapReduceClientBuilder.standard()
                                .withCredentials(new AWSStaticCredentialsProvider(credentials_profile))
                                .withRegion(Regions.US_WEST_1)
                                .build();

                // create a step to enable debugging in the AWS Management Console
                StepFactory stepFactory = new StepFactory();
                StepConfig enabledebugging = new StepConfig()
                                .withName("Enable debugging")
                                .withActionOnFailure("TERMINATE_JOB_FLOW")
                                .withHadoopJarStep(stepFactory.newEnableDebuggingStep());

                Application spark = new Application().withName("Spark");

                RunJobFlowRequest request = new RunJobFlowRequest()
                                .withName("Spark Cluster")
                                .withReleaseLabel("emr-5.20.0")
                                .withSteps(enabledebugging)
                                .withApplications(spark)
                                .withLogUri("s3://path/to/my/logs/")
                                .withServiceRole("EMR_DefaultRole")
                                .withJobFlowRole("EMR_EC2_DefaultRole")
                                .withInstances(new JobFlowInstancesConfig()
                                                .withEc2SubnetId("subnet-12ab3c45")
                                                .withEc2KeyName("myEc2Key")
                                                .withInstanceCount(3)
                                                .withKeepJobFlowAliveWhenNoSteps(true)
                                                .withMasterInstanceType("m4.large")
                                                .withSlaveInstanceType("m4.large"));
                RunJobFlowResult result = emr.runJobFlow(request);
                System.out.println("The cluster ID is " + result.toString());
        }
}

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

Spark

在 Amazon EMR 6.x 上使用 Docker 运行 Spark 应用程序

选择您的 Cookie 首选项

自定义 Cookie 首选项

关键

性能

功能

广告

无法保存 Cookie 首选项

使用 Apache Spark 创建集群

启动安装了 Spark 的集群

注意

要启动安装了 Spark 的集群，请使用 AWS CLI

注意

使用 SDK for Java 启动安装了 Spark 的集群

此页内容对您是否有帮助？

下一主题：

上一主题：

需要帮助吗？