本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
使用服務帳戶 (IRSA) IAM的角色設定叢集存取許可
本節使用範例示範如何設定 Kubernetes 服務帳戶以擔任 AWS Identity and Access Management 角色。然後,使用服務帳戶的 Pod 可以存取角色具有存取許可的任何 AWS 服務。
下列範例會執行 Spark 應用程式,以計算 Amazon S3 中檔案的字數。若要這麼做,您可以為服務帳戶 (IRSA) 設定IAM角色,以驗證和授權 Kubernetes 服務帳戶。
注意
此範例將 "spark-operator" 命名空間用於 Spark Operator 以及您在其中提交 Spark 應用程式的命名空間。
必要條件
嘗試此頁面的範例之前,請先完成下列先決條件:
-
將您最喜愛的詩歌儲存在名為
poem.txt
的文字檔案中,然後將檔案上傳到 S3 儲存貯體。在此頁面中建立的 Spark 應用程式將讀取文字檔案的內容。如需有關將檔案上傳到 S3 的詳細資訊,請參閱 Amazon Simple Storage Service 使用者指南中的上傳物件至儲存貯體。
設定 Kubernetes 服務帳戶以擔任IAM角色
使用下列步驟來設定 Kubernetes 服務帳戶,以擔任 IAM Pod 可用來存取角色具有存取許可之 AWS 服務的角色。
-
完成 後必要條件,請使用 AWS Command Line Interface 建立
example-policy.json
允許唯讀存取您上傳至 Amazon S3 的檔案:cat >example-policy.json <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::
my-pod-bucket
", "arn:aws:s3:::my-pod-bucket
/*" ] } ] } EOF -
然後,建立IAM政策
example-policy
:aws iam create-policy --policy-name example-policy --policy-document file://example-policy.json
-
接下來,建立IAM角色並將其與 Spark 驅動程式的 Kubernetes 服務帳戶建立
example-role
關聯:eksctl create iamserviceaccount --name driver-account-sa --namespace spark-operator \ --cluster my-cluster --role-name "example-role" \ --attach-policy-arn arn:aws:iam::
111122223333
:policy/example-policy --approve -
使用 Spark 驅動程式服務帳戶所需的叢集角色連結來建立 yaml 檔案:
cat >spark-rbac.yaml <<EOF apiVersion: v1 kind: ServiceAccount metadata: name: driver-account-sa --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: spark-role roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: edit subjects: - kind: ServiceAccount name: driver-account-sa namespace: spark-operator EOF
-
套用叢集角色連結組態:
kubectl apply -f spark-rbac.yaml
kubectl 命令應該確認成功建立帳戶:
serviceaccount/driver-account-sa created
clusterrolebinding.rbac.authorization.k8s.io/spark-role configured
從 Spark Operator 中執行應用程式
設定 Kubernetes 服務帳戶之後,可以執行 Spark 應用程式來計算作為 必要條件 的一部分上傳的文字檔案中的字數。
-
建立新檔案
word-count.yaml
,其中包含字數統計應用程式的SparkApplication
定義。cat >word-count.yaml <<EOF apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: word-count namespace: spark-operator spec: type: Java mode: cluster image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest" imagePullPolicy: Always mainClass: org.apache.spark.examples.JavaWordCount mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar arguments: - s3://
my-pod-bucket
/poem.txt hadoopConf: # EMRFS filesystem fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate fs.s3.buffer.dir: /mnt/s3 fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000" mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2" mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true" sparkConf: # Required for EMR Runtime spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/* spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/* spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native sparkVersion: "3.3.1" restartPolicy: type: Never driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.3.1 serviceAccount: my-spark-driver-sa executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.3.1 EOF -
提交 Spark 應用程式。
kubectl apply -f word-count.yaml
kubectl 命令應該傳回您已成功建立名為
word-count
的SparkApplication
物件的確認資訊。sparkapplication.sparkoperator.k8s.io/word-count configured
-
若要檢查
SparkApplication
物件的事件,請執行下列命令:kubectl describe sparkapplication word-count -n spark-operator
kubectl 命令應傳回
SparkApplication
的描述和事件:Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SparkApplicationSpecUpdateProcessed 3m2s (x2 over 17h) spark-operator Successfully processed spec update for SparkApplication word-count Warning SparkApplicationPendingRerun 3m2s (x2 over 17h) spark-operator SparkApplication word-count is pending rerun Normal SparkApplicationSubmitted 2m58s (x2 over 17h) spark-operator SparkApplication word-count was submitted successfully Normal SparkDriverRunning 2m56s (x2 over 17h) spark-operator Driver word-count-driver is running Normal SparkExecutorPending 2m50s spark-operator Executor [javawordcount-fdd1698807392c66-exec-1] is pending Normal SparkExecutorRunning 2m48s spark-operator Executor [javawordcount-fdd1698807392c66-exec-1] is running Normal SparkDriverCompleted 2m31s (x2 over 17h) spark-operator Driver word-count-driver completed Normal SparkApplicationCompleted 2m31s (x2 over 17h) spark-operator SparkApplication word-count completed Normal SparkExecutorCompleted 2m31s (x2 over 2m31s) spark-operator Executor [javawordcount-fdd1698807392c66-exec-1] completed
應用程式現在正在計算 S3 檔案中的單詞。要尋找字數,請參閱驅動程式的日誌檔案:
kubectl logs pod/word-count-driver -n spark-operator
kubectl 命令應傳回日誌檔案的內容及字數統計應用程式的結果。
INFO DAGScheduler: Job 0 finished: collect at JavaWordCount.java:53, took 5.146519 s
Software: 1
如需如何透過 Spark 運算子將應用程式提交至 Spark 的詳細資訊,請參閱 上的在 SparkApplication