本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
使用任务提交者分类
概述
Amazon EMR on EKS StartJobRun
reques t 会创建一个任务提交者窗格(也称为作业运行器窗格)来生成 Spark 驱动程序。您可以使用emr-job-submitter
分类为作业提交者窗格配置节点选择器,也可以为作业提交者窗格的日志容器设置图像和内存。CPU
该emr-job-submitter
分类下有以下设置可用:
jobsubmitter.node.selector.[
labelKey
]-
添加到任务提交者 Pod 的节点选择器中,以键
和值作为配置的配置值。例如,您可以将labelKey
jobsubmitter.node.selector.identifier
设置为myIdentifier
,任务提交者 Pod 会有一个键标识符值为myIdentifier
的节点选择器。这可以用来指定任务提交者窗格可以放在哪些节点上。要添加多个节点选择器键,可使用此前缀设置多个配置。 jobsubmitter.logging.image
-
设置用于作业提交者窗格上的日志容器的自定义镜像。
jobsubmitter.logging.request.cores
-
为作业提交者窗格上的日志容器的CPUs数量设置自定义值(以CPU单位为单位)。默认情况下,此值设置为 100 米。
jobsubmitter.logging.request.memory
-
为作业提交者窗格上的日志容器设置一个自定义的内存量(以字节为单位)。默认情况下,此值设置为 200Mi。兆字节是一种类似于兆字节的度量单位。
我们建议将任务提交者窗格放在按需实例上。如果运行作业提交者容器的实例受到竞价型实例中断,则在竞价型实例上放置作业提交者容器可能会导致任务失败。您也可以将任务提交者 Pod 放在单个可用区中,也可以使用应用于节点的任何 Kubernetes 标签。
任务提交者分类示例
本节内容
StartJobRun
为任务提交者 Pod 请求按需型节点放置
cat >spark-python-in-s3-nodeselector-job-submitter.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "
virtual-cluster-id
", "executionRoleArn": "execution-role-arn
", "releaseLabel": "emr-6.11.0-latest
", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix
/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.eks.amazonaws.com/capacityType": "ON_DEMAND" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter.json
StartJobRun
为任务提交者 Pod 请求单可用区型节点放置
cat >spark-python-in-s3-nodeselector-job-submitter-az.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "
virtual-cluster-id
", "executionRoleArn": "execution-role-arn
", "releaseLabel": "emr-6.11.0-latest
", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix
/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone
" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter-az.json
StartJobRun
使用单可用区和 Amazon EC2 实例类型放置来请求任务提交者 pod
{ "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "
virtual-cluster-id
", "executionRoleArn": "execution-role-arn
", "releaseLabel": "emr-6.11.0-latest
", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix
/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.kubernetes.pyspark.pythonVersion=3 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6 --conf spark.sql.shuffle.partitions=1000" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false", } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone
", "jobsubmitter.node.selector.node.kubernetes.io/instance-type":"m5.4xlarge
" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } }
StartJobRun
使用自定义日志记录容器镜像和内存进行请求 CPU
{ "name": "spark-python", "virtualClusterId": "virtual-cluster-id", "executionRoleArn": "execution-role-arn", "releaseLabel": "emr-6.11.0-latest", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix/trip-count.py" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "emr-job-submitter", "properties": { "jobsubmitter.logging.image": "
YOUR_ECR_IMAGE_URL
", "jobsubmitter.logging.request.memory": "200Mi", "jobsubmitter.logging.request.cores": "0.5" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } }