使用服務帳戶 (IRSA) IAM的角色設定叢集存取許可 - Amazon EMR

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

使用服務帳戶 (IRSA) IAM的角色設定叢集存取許可

本節使用範例示範如何設定 Kubernetes 服務帳戶以擔任 AWS Identity and Access Management 角色。然後,使用服務帳戶的 Pod 可以存取角色具有存取許可的任何 AWS 服務。

下列範例會執行 Spark 應用程式,以計算 Amazon S3 中檔案的字數。若要這麼做,您可以為服務帳戶 (IRSA) 設定IAM角色,以驗證和授權 Kubernetes 服務帳戶。

注意

此範例將 "spark-operator" 命名空間用於 Spark Operator 以及您在其中提交 Spark 應用程式的命名空間。

必要條件

嘗試此頁面的範例之前,請先完成下列先決條件:

設定 Kubernetes 服務帳戶以擔任IAM角色

使用下列步驟來設定 Kubernetes 服務帳戶,以擔任 IAM Pod 可用來存取角色具有存取許可之 AWS 服務的角色。

  1. 完成 後必要條件,請使用 AWS Command Line Interface 建立example-policy.json允許唯讀存取您上傳至 Amazon S3 的檔案:

    cat >example-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::my-pod-bucket", "arn:aws:s3:::my-pod-bucket/*" ] } ] } EOF
  2. 然後,建立IAM政策example-policy

    aws iam create-policy --policy-name example-policy --policy-document file://example-policy.json
  3. 接下來,建立IAM角色並將其與 Spark 驅動程式的 Kubernetes 服務帳戶建立example-role關聯:

    eksctl create iamserviceaccount --name driver-account-sa --namespace spark-operator \ --cluster my-cluster --role-name "example-role" \ --attach-policy-arn arn:aws:iam::111122223333:policy/example-policy --approve
  4. 使用 Spark 驅動程式服務帳戶所需的叢集角色連結來建立 yaml 檔案:

    cat >spark-rbac.yaml <<EOF apiVersion: v1 kind: ServiceAccount metadata: name: driver-account-sa --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: spark-role roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: edit subjects: - kind: ServiceAccount name: driver-account-sa namespace: spark-operator EOF
  5. 套用叢集角色連結組態:

    kubectl apply -f spark-rbac.yaml

kubectl 命令應該確認成功建立帳戶:

serviceaccount/driver-account-sa created clusterrolebinding.rbac.authorization.k8s.io/spark-role configured

從 Spark Operator 中執行應用程式

設定 Kubernetes 服務帳戶之後,可以執行 Spark 應用程式來計算作為 必要條件 的一部分上傳的文字檔案中的字數。

  1. 建立新檔案 word-count.yaml,其中包含字數統計應用程式的 SparkApplication 定義。

    cat >word-count.yaml <<EOF apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: word-count namespace: spark-operator spec: type: Java mode: cluster image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest" imagePullPolicy: Always mainClass: org.apache.spark.examples.JavaWordCount mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar arguments: - s3://my-pod-bucket/poem.txt hadoopConf: # EMRFS filesystem fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate fs.s3.buffer.dir: /mnt/s3 fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000" mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2" mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true" sparkConf: # Required for EMR Runtime spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/* spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/* spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native sparkVersion: "3.3.1" restartPolicy: type: Never driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.3.1 serviceAccount: my-spark-driver-sa executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.3.1 EOF
  2. 提交 Spark 應用程式。

    kubectl apply -f word-count.yaml

    kubectl 命令應該傳回您已成功建立名為 word-countSparkApplication 物件的確認資訊。

    sparkapplication.sparkoperator.k8s.io/word-count configured
  3. 若要檢查 SparkApplication 物件的事件,請執行下列命令:

    kubectl describe sparkapplication word-count -n spark-operator

    kubectl 命令應傳回 SparkApplication 的描述和事件:

    Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SparkApplicationSpecUpdateProcessed 3m2s (x2 over 17h) spark-operator Successfully processed spec update for SparkApplication word-count Warning SparkApplicationPendingRerun 3m2s (x2 over 17h) spark-operator SparkApplication word-count is pending rerun Normal SparkApplicationSubmitted 2m58s (x2 over 17h) spark-operator SparkApplication word-count was submitted successfully Normal SparkDriverRunning 2m56s (x2 over 17h) spark-operator Driver word-count-driver is running Normal SparkExecutorPending 2m50s spark-operator Executor [javawordcount-fdd1698807392c66-exec-1] is pending Normal SparkExecutorRunning 2m48s spark-operator Executor [javawordcount-fdd1698807392c66-exec-1] is running Normal SparkDriverCompleted 2m31s (x2 over 17h) spark-operator Driver word-count-driver completed Normal SparkApplicationCompleted 2m31s (x2 over 17h) spark-operator SparkApplication word-count completed Normal SparkExecutorCompleted 2m31s (x2 over 2m31s) spark-operator Executor [javawordcount-fdd1698807392c66-exec-1] completed

應用程式現在正在計算 S3 檔案中的單詞。要尋找字數,請參閱驅動程式的日誌檔案:

kubectl logs pod/word-count-driver -n spark-operator

kubectl 命令應傳回日誌檔案的內容及字數統計應用程式的結果。

INFO DAGScheduler: Job 0 finished: collect at JavaWordCount.java:53, took 5.146519 s Software: 1

如需如何透過 Spark 運算子將應用程式提交至 Spark 的詳細資訊,請參閱 上的在 SparkApplication Kubernetes Operator for Apache Spark (spark-on-k8s-operator) 文件中使用 GitHub。