View a markdown version of this page

Enable Amazon EKS Auto Mode across EKS clusters by using GitHub Actions - AWS Prescriptive Guidance

Enable Amazon EKS Auto Mode across EKS clusters by using GitHub Actions

Urbija Goswami and Anugrah Lakra, Amazon Web Services

Summary

Amazon Elastic Kubernetes Service (EKS) clusters traditionally require manual management of compute resources through node groups. This creates operational overhead for:

  • Capacity planning and scaling decisions

  • Node provisioning and lifecycle management

  • Cost optimization across different workload types

  • Infrastructure maintenance and updates

Amazon EKS Auto Mode automates compute resource management by dynamically provisioning and scaling nodes based on workload demands, eliminating the need for manual node group management.

However, many organizations struggle to consistently enable and manage Amazon EKS Auto Mode across their existing and new clusters. Common challenges include:

  • Complex migration processes from existing node groups

  • Risk of service disruption during transition

  • Need for careful capacity planning and testing

  • Requirement for specific Amazon IAM permissions and configurations

  • Coordination across multiple teams and environments

This pattern implements a GitHub Actions workflow that enables EKS Auto Mode on EKS clusters in a specific AWS Region. Before enabling Auto Mode, the workflow creates timestamped backups of the cluster state (cluster configuration, node groups, Helm releases, and custom resources) and uploads them to an Amazon S3 bucket.

After enabling Auto Mode, the workflow drains and deletes old node groups, updates cluster role permissions, and cleans up previous scaling components such as Karpenter and Cluster Autoscaler. The workflow can be integrated with existing continuous integration and continuous delivery/deployment (CI/CD) pipelines.

Prerequisites and limitations

Prerequisites

1. Required

2. Local tools installation

3. EKS Cluster Requirements

  • Kubernetes version 1.29 or later

  • Endpoint access configuration:

    • Either it is set to public and private endpoints 

    • Or Private endpoint with NAT Gateway in private subnets

  • EKS API and ConfigMap cluster access enabled (required to allow EKS to dynamically manage Auto Mode nodes and update the aws-auth ConfigMap for proper cluster authentication during migration)

  • Active node groups or managed node pools

4. IAM OIDC Configuration Requirements

  • IAM role and identity provider for GitHub that includes:

    • Trust policy for GitHub OIDC

    • Permissions for:

      • EKS Cluster management

      • S3 bucket access

      • IAM role management

      • EC2 network management

  • See the iam.tf code for simple setup using Terraform. The IAM role (GitHubActionsEKSRole) will be created when the Terraform code is applied.

Limitations

  • Only supports EKS clusters with Kubernetes version 1.29 and above

  • Only supports Karpenter version 1.1.0 and above

  • Region-specific implementation. Some AWS services aren't available in all AWS regions. For region availability, see AWS services by Region

  • Requires cluster endpoint accessibility

  • Limited to AWS-managed node groups

Architecture

Target technology stack

Target architecture

  1. The GitHub Actions Workflow is triggered from the GitHub Repository by the user.

  2. The GitHub Actions Workflow assumes an IAM role using OIDC to make the necessary changes in the AWS account. It also checks for the presence of the EKS Auto Node role in the account and if not present, the role is created and the necessary policies are attached.

  3. A backup of the current state of the EKS cluster needing Auto Mode enabled is uploaded to an S3 bucket.

  4. The cluster role of the cluster needing Auto Mode enabled is retrieved and additional permissions (AmazonEKSComputePolicy, AmazonEKSBlockStoragePolicy, AmazonEKSLoadBalancingPolicy, AmazonEKSNetworkingPolicy, AmazonEKSClusterPolicy) are added to it if not present for EKS Auto Mode. Additionally, as a pre-migration step, subnets of the clusters are updated with tags for EKS Auto Mode enablement.

  5. The workflow enables the EKS Auto Mode in the EKS cluster.

  6. Old node groups are identified and deleted. This is skipped if the user hasn’t given the permissions to the IAM role described in the optional setup steps below.

  7. Scaling components (Karpenter and Cluster Autoscaler) are also removed if present previously.

The GitHub Actions workflow consists of three main jobs:

  1. check-clusters: Identifies clusters without Auto Mode enabled and updates IAM policies and subnet tags.

  2. backup-and-check: Backs up cluster state before migration.

  3. gradual-migration : Enables Auto Mode while gradually draining existing node groups and cleaning up old scaling components. It also does a final verification of clusters’ states after migration.

Note

If you need node configuration backups or plan to delete nodes/node groups during migration to EKS Auto Mode, then you can add the IAM role created using the terraform code to aws-auth ConfigMap. Without it, you can still view node group configurations. 

Tools

AWS CLI:

AWS Command Line Interface (AWS CLI) is an open source tool that helps you interact with AWS services through commands in your command-line shell. In our solution, we make use of the command-line interface for AWS services to execute EKS cluster configuration updates, IAM role updates and query cluster status throughout the automation process.

Amazon EKS:

Amazon Elastic Kubernetes Service (Amazon EKS) helps you run Kubernetes on AWS without needing to install or maintain your own Kubernetes control plane or nodes. In this pattern, Amazon EKS is the target service where Auto Mode is enabled to automate compute provisioning and node scaling across clusters in a specific Region.

IAM:

AWS Identity and Access Management (IAM) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them. In our solution, we use it to manage permissions for GitHub Actions to modify EKS cluster configurations via OIDC federation. The solution also modifies the cluster role permissions and adds a job to create EKS Node Role so that EKS Auto Mode can schedule the pending pods in new nodes that it spins up as a part of the node pools.

Amazon S3:

Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data. In our solution, we use an S3 bucket to store the timestamped backups of the clusters before EKS Auto Mode is enabled in them, which would help in disaster recovery.

Other tools:

GitHub Actions:

GitHub Actions is a CI/CD platform that is used in our solution to automate the EKS Auto Mode enablement workflow. It also provides secure authentication via OIDC and manages pipeline execution across multiple clusters.  

HashiCorp Terraform:

Terraform is an infrastructure as code (IaC) tool that helps you use code to provision and manage cloud infrastructure and resources. Our solution uses terraform to provision IAM roles and policies and to add OIDC provider configuration for secure GitHub Actions integration. 

Code repository

The code for this pattern is available in the GitHub EKS Auto Mode Enablement via GitHub Actions repository.

Best practices

  • Security:

    • Follow the principle of least privilege and grant the minimum permissions required to perform a task. For more information, see Grant least privilege and Security best practices in the IAM documentation. See the iam.tf file in the repository for the minimum required configuration. 

    • Scope the IAM role trust policy to your specific GitHub repository and branch to prevent unauthorized workflow runs from assuming the role. 

    • Enable EKS control plane logging (API server, audit, authenticator) before starting the migration so you can diagnose scheduling or authentication issues after Auto Mode is enabled. 

    • Add --sse AES256 to all aws s3 cp commands in the backup script to enforce server-side encryption on cluster state backups. 

  • Reliability:

    • Test the workflow against a non-production cluster first. Verify that workloads reschedule correctly on Auto Mode nodes before migrating production clusters. 

    • Verify that S3 backups completed successfully and contain valid cluster config, node group, Helm release, and custom resource data before proceeding with Auto Mode enablement. 

    • After enabling Auto Mode, monitor pod scheduling events and node provisioning latency using Amazon CloudWatch Container Insights to detect issues early. 

  • Performance:

    • Review Auto Mode node pool scaling patterns periodically and adjust workload resource requests and limits to avoid over-provisioning or scheduling delays.

  • Cost:

    • Tag EKS clusters and associated resources (IAM roles, S3 backup buckets, subnets) with environment and ownership metadata to support cost tracking and operational visibility. For more information, see tagging AWS resources documentation. You can edit the workflow file to add custom tags during the migration process. 

    • Set up AWS Cost Explorer alerts to monitor changes in compute costs after enabling Auto Mode, since Auto Mode may change instance types and scaling behavior. For more information, see Analyzing your costs with AWS Cost Explorer documentation.  

  • Operations:

    • Keep the workflow file and Terraform configurations in version control and document any environment-specific overrides such as region, role ARN, or S3 bucket name.   

Epics

TaskDescriptionSkills required

Configure the GitHub repository.

  1. Clone and fork the GitHub repository. Once cloned, copy the workflow file to your GitHub repository 

    git clone https://github.com/aws-samples/sample-enable-eks-auto-mode-using-github-actions.git
    cd sample-enable-eks-auto-mode-using-github-actions
    cp .github/workflows/enable-eks-auto-mode.yml /path/to/your/repository/.github/workflows
  2. Commit and push the changes to your GitHub repo

    cd <path/to/your/repository> git add . git commit -m "Added EKS Auto Mode configurations" git push origin main
  3. Set Up the git secrets for the repository:

    gh auth login --web #authenticate to your github account using web
    #create secrets gh secret set AWS_REGION --body "us-east-1"
    gh secret set AWS_ROLE_ARN --body "arn:aws:iam:ACCOUNT_ID:role/GitHubActionsEKSRole" #replace the account id with your account ID
AWS DevOps, Cloud architect
TaskDescriptionSkills required

Set up IAM for backup and node group deletion

  1. Add the role to the aws-auth ConfigMap using your terminal:

eksctl create iamidentitymapping \ --cluster $CLUSTER_NAME\ --region us-east-1 \ --arn arn:aws:iam::$ACCOUNT_ID:role/GitHubActionsEKSRole \ --group system:masters \ --username github-actions

Replace the $CLUSTER_NAME and $ACCOUNT_ID with the appropriate values.

  1. For more than one cluster, you can run the following commands in the terminal assuming a role that has Admin or equivalent level access to your account:

CLUSTERS=$(aws eks list-clusters --region $AWS_REGION --query 'clusters[]' --output text) CLUSTERS_NEEDING_AUTO_MODE="" for cluster in $CLUSTERS; do AUTO_MODE=$(aws eks describe-cluster --name $cluster --region $AWS_REGION --query 'cluster.computeConfig.enabled' --output text 2>/dev/null || echo "false") if [ "$AUTO_MODE" != "True" ]; then CLUSTERS_NEEDING_AUTO_MODE="$CLUSTERS_NEEDING_AUTO_MODE $cluster" echo " Adding role access to cluster..." eksctl create iamidentitymapping \ --cluster $cluster \ --region $AWS_REGION \ --arn $ROLE_ARN \ --group system:masters\ --username github-actions || echo " ⚠️ Role mapping may already exist" echo " ✅ Role access configured for $cluster" done

Replace the $AWS_REGION and $ROLE_ARN with the specific region and the arn of the IAM role created above respectively.

AWS DevOps, Cloud architect
TaskDescriptionSkills required

Trigger the GitHub Actions workflow.

The workflow is triggered automatically when any changes are pushed to the feature, main, or dev branches. To manually trigger via GitHub UI: 1. Go to the repository on GitHub 2. Click on the "Actions" tab 3. Select the workflow (auto-mode-pipeline) 4. Click "Run workflow" button 5. Choose the branch and click "Run workflow"

The workflow handles verification after migration by querying each migrated cluster's compute configuration using the AWS CLI to confirm that EKS Auto Mode has been successfully enabled and displays the current compute settings in a table format.

AWS DevOps, Cloud architect
TaskDescriptionSkills required

Implementation of multi-environment deployment.

  • The solution can be made environment-specific by leveraging branch-based deployments.

  • Different branches (main, dev, feat/*) trigger workflows with environment-specific configurations through GitHub secrets (AWS_REGION, AWS_ROLE_ARN, S3_BACKUP_BUCKET).

  • This allows for separate AWS regions, IAM roles, and cluster sets per environment while maintaining consistent automation logic across all environments.

TaskDescriptionSkills required

Clean up resources.

  1. To detach the IAM role from the aws-auth ConfigMap, use the following terminal command:

    eksctl delete iamidentitymapping \ --cluster $cluster \ --region $AWS_REGION \ --arn $ROLE_ARN
General AWS, Cloud architect

Troubleshooting

IssueSolution

Authentication Issues

• Verify GitHub OIDC provider is configured correctly in AWS IAM

• Check that the IAM role ARN in git secrets matches the actual role created with terraform (GitHubActionsEKSRole)

• Ensure GitHub repository has necessary secrets configured- AWS_REGION and AWS_ROLE_ARN.

• Validate AWS Region settings match your cluster locations

Permission Problems

• Test IAM role permissions locally: bash aws sts assume-role --role-arn <role-arn> --role-session-name test-session aws eks list-clusters

• Ensure the role has eks:UpdateClusterConfig and eks:DescribeCluster permissions

Cluster Compatibility

• Confirm EKS clusters are running Kubernetes 1.29 or above: bash aws eks describe-cluster --name <cluster-name> --query 'cluster.version'

• Verify clusters are in ACTIVE state before enabling Auto Mode

Workflow Failures

• Check GitHub Actions logs for specific error messages

• Verify the workflow file syntax in .github/workflows/auto-mode-pipeline.yml

• Ensure environment variables are properly set in the workflow

Related resources