疑難排解 - Amazon EMR

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

疑難排解

本節說明如何對 Amazon EMR on 的問題進行疑難排解EKS。如需有關如何對 Amazon 的一般問題進行疑難排解的資訊EMR,請參閱 Amazon EMR管理指南 中的對叢集進行疑難排解

安裝 Helm Chart 時找不到資源映射

安裝 Helm Chart 時,可能會遇到下列錯誤訊息。

Error: INSTALLATION FAILED: pulling from host 1234567890.dkr.ecr.us-west-2.amazonaws.com failed with status code [manifests 6.13.0]: 403 Forbidden Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: [resource mapping not found for name: "flink-operator-serving-cert" namespace: "<the namespace to install your operator>" from "": no matches for kind "Certificate" in version "cert-manager.io/v1" ensure CRDs are installed first, resource mapping not found for name: "flink-operator-selfsigned-issuer" namespace: "<the namespace to install your operator>" " from "": no matches for kind "Issuer" in version "cert-manager.io/v1" ensure CRDs are installed first].

若要解決此錯誤,請安裝 cert-manager 以啟用新增 Webhook 元件。您必須安裝 cert-manager 您使用的每個 Amazon EKS叢集。

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0

如果您看到 存取遭拒 錯誤,請確認 Helm Chart operatorExecutionRoleArn values.yaml檔案中的 IAM角色具有正確的許可。此外,請確定FlinkDeployment規格executionRoleArn中 IAM的角色具有正確的許可。

如果您的 FlinkDeployment 處於停止狀態,請使用下列步驟強制刪除部署:

  1. 編輯部署執行。

    kubectl edit -n Flink Namespace flinkdeployments/App Name
  2. 刪除此完成項。

    finalizers: - flinkdeployments.flink.apache.org/finalizer
  3. 刪除部署。

    kubectl delete -n Flink Namespace flinkdeployments/App Name

如果您在選擇加入 AWS 區域中執行 Flink 應用程式,您可能會看到下列錯誤:

Caused by: org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3://flink.txt: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: ABCDEFGHIJKL; S3 Extended Request ID: ABCDEFGHIJKLMNOP=; Proxy: null), S3 Extended Request ID: ABCDEFGHIJKLMNOP=:400 Bad Request: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: ABCDEFGHIJKL; S3 Extended Request ID: ABCDEFGHIJKLMNOP=; Proxy: null)
Caused by: org.apache.hadoop.fs.s3a.AWSBadRequestException: getS3Region on flink-application: software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 400, Request ID: ABCDEFGHIJKLMNOP, Extended Request ID: ABCDEFGHIJKLMNOPQRST==):null: null (Service: S3, Status Code: 400, Request ID: ABCDEFGHIJKLMNOP, Extended Request ID: AHl42uDNaTUFOus/5IIVNvSakBcMjMCH7dd37ky0vE6jhABCDEFGHIJKLMNOPQRST==)

若要修正這些錯誤,請在FlinkDeployment定義檔案中使用以下組態。

spec: flinkConfiguration: taskmanager.numberOfTaskSlots: "2" fs.s3a.endpoint.region: OPT_IN_AWS_REGION_NAME

我們也建議您使用SDKv2憑證提供者:

fs.s3a.aws.credentials.provider: software.amazon.awssdk.auth.credentials.WebIdentityTokenFileCredentialsProvider

如果您想要使用SDKv1憑證提供者,請確定您的 SDK支援您的選擇加入區域。如需詳細資訊,請參閱aws-sdk-java GitHub 儲存庫

如果您在選擇加入區域執行 Flink SQL陳述式S3 AWSBadRequestException時收到 ,請確定您在 Flink 組態規格fs.s3a.endpoint.region: OPT_IN_AWS_REGION_NAME中設定組態。

對於 Amazon 6.15.0 - 7.2.0 EMR版,當您在 CN 區域執行 Flink 工作階段任務時,可能會遇到下列錯誤訊息。這些包括中國 (北京) 和中國 (寧夏):

Error: {"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3://ABCDPath: software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 400, Request ID: ABCDEFGH, Extended Request ID: ABCDEFGH:null: null (Service: S3, Status Code: 400, Request ID: ABCDEFGH, Extended Request ID: ABCDEFGH","additionalMetadata":{},"throwableList": [{"type":"org.apache.hadoop.fs.s3a.AWSBadRequestException","message":"getFileStatus on s3://ABCDPath: software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 400, Request ID: ABCDEFGH, Extended Request ID: ABCDEFGH:null: null (Service: S3, Status Code: 400, Request ID: ABCDEFGH, Extended Request ID: ABCDEFGH","additionalMetadata":{}},{"type":"software.amazon.awssdk.services.s3.model.S3Exception","message":"null (Service: S3, Status Code: 400, Request ID: ABCDEFGH, Extended Request ID: ABCDEFGH","additionalMetadata":{}}]}

知道此問題。團隊正在努力修補所有這些發行版本的 flink 運算子。不過,在我們完成修補程式之前,若要修正此錯誤,您需要下載 flink 運算子 Helm Chart、取消排解它 (擷取壓縮檔案),並在 Helm Chart 中進行組態變更。

具體步驟如下:

  1. 將 變更為 ,特別是將目錄變更為 Helm Chart 的本機資料夾,然後執行下列命令列以提取 Helm Chart 並解壓縮 (擷取)。

    helm pull oci://public.ecr.aws/emr-on-eks/flink-kubernetes-operator \ --version $VERSION \ --namespace $NAMESPACE
    tar -zxvf flink-kubernetes-operator-$VERSION.tgz
  2. 前往 Helm Chart 資料夾並尋找 templates/flink-operator.yaml 檔案。

  3. 尋找 flink-operator-config ConfigMap 並在 中新增下列fs.s3a.endpoint.region組態flink-conf.yaml。例如:

    {{- if .Values.defaultConfiguration.create }} apiVersion: v1 kind: ConfigMap metadata: name: flink-operator-config namespace: {{ .Release.Namespace }} labels: {{- include "flink-operator.labels" . | nindent 4 }} data: flink-conf.yaml: |+ fs.s3a.endpoint.region: {{ .Values.emrContainers.awsRegion }}
  4. 安裝本機 Helm Chart 並執行您的任務。