

 **Help improve this page** 

To contribute to this user guide, choose the **Edit this page on GitHub** link that is located in the right pane of every page.

# Manage compute resources by using nodes
<a name="eks-compute"></a>

A Kubernetes node is a machine that runs containerized applications. Each node has the following components:
+  ** [Container runtime](https://kubernetes.io/docs/setup/production-environment/container-runtimes/) ** – Software that’s responsible for running the containers.
+  ** [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) ** – Makes sure that containers are healthy and running within their associated Pod.
+  ** [kube-proxy](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/) ** – Maintains network rules that allow communication to your Pods.

For more information, see [Nodes](https://kubernetes.io/docs/concepts/architecture/nodes/) in the Kubernetes documentation.

Your Amazon EKS cluster can schedule Pods on any combination of [EKS Auto Mode managed nodes](automode.md), [self-managed nodes](worker.md), [Amazon EKS managed node groups](managed-node-groups.md), [AWS Fargate](fargate.md), and [Amazon EKS Hybrid Nodes](hybrid-nodes-overview.md). To learn more about nodes deployed in your cluster, see [View Kubernetes resources in the AWS Management Console](view-kubernetes-resources.md).

**Note**  
Excluding hybrid nodes, nodes must be in the same VPC as the subnets you selected when you created the cluster. However, the nodes don’t have to be in the same subnets.

## Compare compute options
<a name="_compare_compute_options"></a>

The following table provides several criteria to evaluate when deciding which options best meet your requirements. Self-managed nodes are another option which support all of the criteria listed, but they require a lot more manual maintenance. For more information, see [Maintain nodes yourself with self-managed nodes](worker.md).

**Note**  
Bottlerocket has some specific differences from the general information in this table. For more information, see the Bottlerocket [documentation](https://github.com/bottlerocket-os/bottlerocket/blob/develop/README.md) on GitHub.


| Criteria | EKS managed node groups | EKS Auto Mode | Amazon EKS Hybrid Nodes | 
| --- | --- | --- | --- | 
|  Can be deployed to [AWS Outposts](https://docs.aws.amazon.com/outposts/latest/userguide/what-is-outposts.html)   |  No  |  No  |  No  | 
|  Can be deployed to an [AWS Local Zone](local-zones.md)   |  Yes  |  No  |  No  | 
|  Can run containers that require Windows  |  Yes  |  No  |  No  | 
|  Can run containers that require Linux  |  Yes  |  Yes  |  Yes  | 
|  Can run workloads that require the Inferentia chip  |   [Yes](inferentia-support.md) – Amazon Linux nodes only  |  Yes  |  No  | 
|  Can run workloads that require a GPU  |   [Yes](eks-optimized-ami.md#gpu-ami) – Amazon Linux nodes only  |  Yes  |  Yes  | 
|  Can run workloads that require Arm processors  |   [Yes](eks-optimized-ami.md#arm-ami)   |  Yes  |  Yes  | 
|  Can run AWS [Bottlerocket](https://aws.amazon.com/bottlerocket/)   |  Yes  |  Yes  |  No  | 
|  Pods share CPU, memory, storage, and network resources with other Pods.  |  Yes  |  Yes  |  Yes  | 
|  Must deploy and manage Amazon EC2 instances  |  Yes  |  No - Learn about [EC2 managed instances](automode-learn-instances.md)   |  Yes – the on-premises physical or virtual machines are managed by you with your choice of tooling.  | 
|  Must secure, maintain, and patch the operating system of Amazon EC2 instances  |  Yes  |  No  |  Yes – the operating system running on your physical or virtual machines are managed by you with your choice of tooling.  | 
|  Can provide bootstrap arguments at deployment of a node, such as extra [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) arguments.  |  Yes – Using `eksctl` or a [launch template](launch-templates.md) with a custom AMI.  |  No - [Use a `NodeClass` to configure nodes](create-node-class.md)   |  Yes - you can customize bootstrap arguments with nodeadm. See [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md).  | 
|  Can assign IP addresses to Pods from a different CIDR block than the IP address assigned to the node.  |  Yes – Using a launch template with a custom AMI. For more information, see [Customize managed nodes with launch templates](launch-templates.md).  |  No  |  Yes - see [Configure CNI for hybrid nodes](hybrid-nodes-cni.md).  | 
|  Can SSH into node  |  Yes  |  No - [Learn how to troubleshoot nodes](auto-troubleshoot.md)   |  Yes  | 
|  Can deploy your own custom AMI to nodes  |  Yes – Using a [launch template](launch-templates.md)   |  No  |  Yes  | 
|  Can deploy your own custom CNI to nodes  |  Yes – Using a [launch template](launch-templates.md) with a custom AMI  |  No  |  Yes  | 
|  Must update node AMI on your own  |   [Yes](update-managed-node-group.md) – If you deployed an Amazon EKS optimized AMI, you’re notified in the Amazon EKS console when updates are available. You can perform the update with one-click in the console. If you deployed a custom AMI, you’re not notified in the Amazon EKS console when updates are available. You must perform the update on your own.  |  No  |  Yes - the operating system running on your physical or virtual machines is managed by you with your choice of tooling. See [Prepare operating system for hybrid nodes](hybrid-nodes-os.md).  | 
|  Must update node Kubernetes version on your own  |   [Yes](update-managed-node-group.md) – If you deployed an Amazon EKS optimized AMI, you’re notified in the Amazon EKS console when updates are available. You can perform the update with one-click in the console. If you deployed a custom AMI, you’re not notified in the Amazon EKS console when updates are available. You must perform the update on your own.  |  No  |  Yes - you manage hybrid nodes upgrades with your own choice of tooling or with `nodeadm`. See [Upgrade hybrid nodes for your cluster](hybrid-nodes-upgrade.md).  | 
|  Can use Amazon EBS storage with Pods  |   [Yes](ebs-csi.md)   |  Yes, as an integrated capability. Learn how to [create a storage class.](create-storage-class.md)   |  No  | 
|  Can use Amazon EFS storage with Pods  |   [Yes](efs-csi.md)   |  Yes  |  No  | 
|  Can use Amazon S3 Files storage with Pods  |   [Yes](s3files-csi.md)   |  Yes  |  No  | 
|  Can use Amazon FSx for Lustre storage with Pods  |   [Yes](fsx-csi.md)   |  Yes  |  No  | 
|  Can use Network Load Balancer for services  |   [Yes](network-load-balancing.md)   |  Yes  |  Yes - must use target type `ip`.  | 
|  Pods can run in a public subnet  |  Yes  |  Yes  |  No - pods run in on-premises environment.  | 
|  Can assign different VPC security groups to individual Pods  |   [Yes](security-groups-for-pods.md) – Linux nodes only  |  No  |  No  | 
|  Can run Kubernetes DaemonSets  |  Yes  |  Yes  |  Yes  | 
|  Support `HostPort` and `HostNetwork` in the Pod manifest  |  Yes  |  Yes  |  Yes  | 
|   AWS Region availability  |   [All Amazon EKS supported regions](https://docs.aws.amazon.com/general/latest/gr/eks.html)   |   [All Amazon EKS supported regions](https://docs.aws.amazon.com/general/latest/gr/eks.html)   |   [All Amazon EKS supported regions](https://docs.aws.amazon.com/general/latest/gr/eks.html) except the AWS GovCloud (US) Regions and the China Regions.  | 
|  Can run containers on Amazon EC2 dedicated hosts  |  Yes  |  No  |  No  | 
|  Pricing  |  Cost of Amazon EC2 instance that runs multiple Pods. For more information, see [Amazon EC2 pricing](https://aws.amazon.com/ec2/pricing/).  |  When EKS Auto Mode is enabled in your cluster, you pay a separate fee, in addition to the standard EC2 instance charges, for the instances launched using Auto Mode’s compute capability. The amount varies with the instance type launched and the AWS region where your cluster is located. For more information, see [Amazon EKS pricing](https://aws.amazon.com/eks/pricing/).  |  Cost of hybrid nodes vCPU per hour. For more information, see [Amazon EKS pricing](https://aws.amazon.com/eks/pricing/).  | 

# Simplify node lifecycle with managed node groups
<a name="managed-node-groups"></a>

Amazon EKS managed node groups automate the provisioning and lifecycle management of nodes (Amazon EC2 instances) for Amazon EKS Kubernetes clusters.

With Amazon EKS managed node groups, you don’t need to separately provision or register the Amazon EC2 instances that provide compute capacity to run your Kubernetes applications. You can create, automatically update, or terminate nodes for your cluster with a single operation. Node updates and terminations automatically drain nodes to ensure that your applications stay available.

Every managed node is provisioned as part of an Amazon EC2 Auto Scaling group that’s managed for you by Amazon EKS. Every resource including the instances and Auto Scaling groups runs within your AWS account. Each node group runs across multiple Availability Zones that you define.

Managed node groups can also optionally leverage node auto repair, which continuously monitors the health of nodes. It automatically reacts to detected problems and replaces nodes when possible. This helps overall availability of the cluster with minimal manual intervention. For more information, see [Detect node health issues and enable automatic node repair](node-health.md).

You can add a managed node group to new or existing clusters using the Amazon EKS console, `eksctl`, AWS CLI, AWS API, or infrastructure as code tools including AWS CloudFormation. Nodes launched as part of a managed node group are automatically tagged for auto-discovery by the Kubernetes [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md). You can use the node group to apply Kubernetes labels to nodes and update them at any time.

There are no additional costs to use Amazon EKS managed node groups, you only pay for the AWS resources you provision. These include Amazon EC2 instances, Amazon EBS volumes, Amazon EKS cluster hours, and any other AWS infrastructure. There are no minimum fees and no upfront commitments.

To get started with a new Amazon EKS cluster and managed node group, see [Get started with Amazon EKS – AWS Management Console and AWS CLI](getting-started-console.md).

To add a managed node group to an existing cluster, see [Create a managed node group for your cluster](create-managed-node-group.md).

## Managed node groups concepts
<a name="managed-node-group-concepts"></a>
+ Amazon EKS managed node groups create and manage Amazon EC2 instances for you.
+ Every managed node is provisioned as part of an Amazon EC2 Auto Scaling group that’s managed for you by Amazon EKS. Moreover, every resource including Amazon EC2 instances and Auto Scaling groups run within your AWS account.
+ Amazon EKS periodically syncs the managed node group’s scaling configuration to match the actual Auto Scaling group values. If an external actor such as Cluster Autoscaler modifies the Auto Scaling group’s size, `DescribeNodegroup` will eventually reflect those changes. When you initiate a node group update or upgrade without explicitly modifying the scaling configuration, the workflow uses the current Auto Scaling group values rather than the node group’s stored scaling configuration. The stored scaling configuration only takes precedence when you explicitly include it in an `UpdateNodegroupConfig` request.
+ The Auto Scaling group of a managed node group spans every subnet that you specify when you create the group.
+ Amazon EKS tags managed node group resources so that they are configured to use the Kubernetes [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md).
**Important**  
If you are running a stateful application across multiple Availability Zones that is backed by Amazon EBS volumes and using the Kubernetes [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md), you should configure multiple node groups, each scoped to a single Availability Zone. In addition, you should enable the `--balance-similar-node-groups` feature.
+ You can use a custom launch template for a greater level of flexibility and customization when deploying managed nodes. For example, you can specify extra `kubelet` arguments and use a custom AMI. For more information, see [Customize managed nodes with launch templates](launch-templates.md). If you don’t use a custom launch template when first creating a managed node group, there is an auto-generated launch template. Don’t manually modify this auto-generated template or errors occur.
+ Amazon EKS follows the shared responsibility model for CVEs and security patches on managed node groups. When managed nodes run an Amazon EKS optimized AMI, Amazon EKS is responsible for building patched versions of the AMI when bugs or issues are reported. We can publish a fix. However, you’re responsible for deploying these patched AMI versions to your managed node groups. When managed nodes run a custom AMI, you’re responsible for building patched versions of the AMI when bugs or issues are reported and then deploying the AMI. For more information, see [Update a managed node group for your cluster](update-managed-node-group.md).
+ Amazon EKS managed node groups can be launched in both public and private subnets. If you launch a managed node group in a public subnet on or after April 22, 2020, the subnet must have `MapPublicIpOnLaunch` set to true for the instances to successfully join a cluster. If the public subnet was created using `eksctl` or the [Amazon EKS vended AWS CloudFormation templates](creating-a-vpc.md) on or after March 26, 2020, then this setting is already set to true. If the public subnets were created before March 26, 2020, you must change the setting manually. For more information, see [Modifying the public IPv4 addressing attribute for your subnet](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-ip-addressing.html#subnet-public-ip).
+ When deploying a managed node group in private subnets, you must ensure that it can access Amazon ECR for pulling container images. You can do this by connecting a NAT gateway to the route table of the subnet or by adding the following [AWS PrivateLink VPC endpoints](https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-endpoints.html#ecr-setting-up-vpc-create):
  + Amazon ECR API endpoint interface – `com.amazonaws.region-code.ecr.api` 
  + Amazon ECR Docker registry API endpoint interface – `com.amazonaws.region-code.ecr.dkr` 
  + Amazon S3 gateway endpoint – `com.amazonaws.region-code.s3` 

  For other commonly-used services and endpoints, see [Deploy private clusters with limited internet access](private-clusters.md).
+ Managed node groups can’t be deployed on [AWS Outposts](eks-outposts.md) or in [AWS Wavelength](https://docs.aws.amazon.com/wavelength/). Managed node groups can be created on [AWS Local Zones](https://aws.amazon.com/about-aws/global-infrastructure/localzones/). For more information, see [Launch low-latency EKS clusters with AWS Local Zones](local-zones.md).
+ You can create multiple managed node groups within a single cluster. For example, you can create one node group with the standard Amazon EKS optimized Amazon Linux AMI for some workloads and another with the GPU variant for workloads that require GPU support.
+ If your managed node group encounters an [Amazon EC2 instance status check](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html) failure, Amazon EKS returns an error code to help you to diagnose the issue. For more information, see [Managed node group error codes](troubleshooting.md#troubleshoot-managed-node-groups).
+ Amazon EKS adds Kubernetes labels to managed node group instances. These Amazon EKS provided labels are prefixed with `eks.amazonaws.com`.
+ Amazon EKS automatically drains nodes using the Kubernetes API during terminations or updates.
+ Pod disruption budgets aren’t respected when terminating a node with `AZRebalance` or reducing the desired node count. These actions try to evict Pods on the node. But if it takes more than 15 minutes, the node is terminated regardless of whether all Pods on the node are terminated. To extend the period until the node is terminated, add a lifecycle hook to the Auto Scaling group. For more information, see [Add lifecycle hooks](https://docs.aws.amazon.com/autoscaling/ec2/userguide/adding-lifecycle-hooks.html) in the *Amazon EC2 Auto Scaling User Guide*.
+ In order to run the drain process correctly after receiving a Spot interruption notification or a capacity rebalance notification, `CapacityRebalance` must be set to `true`.
+ Updating managed node groups respects the Pod disruption budgets that you set for your Pods. For more information, see [Understand each phase of node updates](managed-node-update-behavior.md).
+ There are no additional costs to use Amazon EKS managed node groups. You only pay for the AWS resources that you provision.
+ If you want to encrypt Amazon EBS volumes for your nodes, you can deploy the nodes using a launch template. To deploy managed nodes with encrypted Amazon EBS volumes without using a launch template, encrypt all new Amazon EBS volumes created in your account. For more information, see [Encryption by default](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html#encryption-by-default) in the *Amazon EC2 User Guide*.

## Managed node group capacity types
<a name="managed-node-group-capacity-types"></a>

When creating a managed node group, you can choose either the On-Demand or Spot capacity type. Amazon EKS deploys a managed node group with an Amazon EC2 Auto Scaling group that either contains only On-Demand or only Amazon EC2 Spot Instances. You can schedule Pods for fault tolerant applications to Spot managed node groups, and fault intolerant applications to On-Demand node groups within a single Kubernetes cluster. By default, a managed node group deploys On-Demand Amazon EC2 instances.

### On-Demand
<a name="managed-node-group-capacity-types-on-demand"></a>

With On-Demand Instances, you pay for compute capacity by the second, with no long-term commitments.

By default, if you don’t specify a **Capacity Type**, the managed node group is provisioned with On-Demand Instances. A managed node group configures an Amazon EC2 Auto Scaling group on your behalf with the following settings applied:
+ The allocation strategy to provision On-Demand capacity is set to `prioritized`. Managed node groups use the order of instance types passed in the API to determine which instance type to use first when fulfilling On-Demand capacity. For example, you might specify three instance types in the following order: `c5.large`, `c4.large`, and `c3.large`. When your On-Demand Instances are launched, the managed node group fulfills On-Demand capacity by starting with `c5.large`, then `c4.large`, and then `c3.large`. For more information, see [Amazon EC2 Auto Scaling group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-purchase-options.html#asg-allocation-strategies) in the *Amazon EC2 Auto Scaling User Guide*.
+ Amazon EKS adds the following Kubernetes label to all nodes in your managed node group that specifies the capacity type: `eks.amazonaws.com/capacityType: ON_DEMAND`. You can use this label to schedule stateful or fault intolerant applications on On-Demand nodes.

### Spot
<a name="managed-node-group-capacity-types-spot"></a>

Amazon EC2 Spot Instances are spare Amazon EC2 capacity that offers steep discounts off of On-Demand prices. Amazon EC2 Spot Instances can be interrupted with a two-minute interruption notice when EC2 needs the capacity back. For more information, see [Spot Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html) in the *Amazon EC2 User Guide*. You can configure a managed node group with Amazon EC2 Spot Instances to optimize costs for the compute nodes running in your Amazon EKS cluster.

To use Spot Instances inside a managed node group, create a managed node group by setting the capacity type as `spot`. A managed node group configures an Amazon EC2 Auto Scaling group on your behalf with the following Spot best practices applied:
+ To ensure that your Spot nodes are provisioned in the optimal Spot capacity pools, the allocation strategy is set to one of the following:
  +  `price-capacity-optimized` (PCO) – When creating new node groups in a cluster with Kubernetes version `1.28` or higher, the allocation strategy is set to `price-capacity-optimized`. However, the allocation strategy won’t be changed for node groups already created with `capacity-optimized` before Amazon EKS managed node groups started to support PCO.
  +  `capacity-optimized` (CO) – When creating new node groups in a cluster with Kubernetes version `1.27` or lower, the allocation strategy is set to `capacity-optimized`.

  To increase the number of Spot capacity pools available for allocating capacity from, configure a managed node group to use multiple instance types.
+ Amazon EC2 Spot Capacity Rebalancing is enabled so that Amazon EKS can gracefully drain and rebalance your Spot nodes to minimize application disruption when a Spot node is at elevated risk of interruption. For more information, see [Amazon EC2 Auto Scaling Capacity Rebalancing](https://docs.aws.amazon.com/autoscaling/ec2/userguide/capacity-rebalance.html) in the *Amazon EC2 Auto Scaling User Guide*.
  + When a Spot node receives a rebalance recommendation, Amazon EKS automatically attempts to launch a new replacement Spot node.
  + If a Spot two-minute interruption notice arrives before the replacement Spot node is in a `Ready` state, Amazon EKS starts draining the Spot node that received the rebalance recommendation. Amazon EKS drains the node on a best-effort basis. As a result, there’s no guarantee that Amazon EKS will wait for the replacement node to join the cluster before draining the existing node.
  + When a replacement Spot node is bootstrapped and in the `Ready` state on Kubernetes, Amazon EKS cordons and drains the Spot node that received the rebalance recommendation. Cordoning the Spot node ensures that the service controller doesn’t send any new requests to this Spot node. It also removes it from its list of healthy, active Spot nodes. Draining the Spot node ensures that running Pods are evicted gracefully.
+ Amazon EKS adds the following Kubernetes label to all nodes in your managed node group that specifies the capacity type: `eks.amazonaws.com/capacityType: SPOT`. You can use this label to schedule fault tolerant applications on Spot nodes.
**Important**  
EC2 issues a [Spot interruption notice](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-instance-termination-notices.html) two minutes prior to terminating your Spot Instance. However, Pods on Spot nodes may not receive the full 2-minute window for graceful shutdown. When EC2 issues the notice, there is a delay before Amazon EKS begins evicting Pods. Evictions occur sequentially to protect the Kubernetes API server, so during multiple simultaneous Spot reclamations, some Pods may receive delayed eviction notices. Pods may be forcibly terminated without receiving termination signals, particularly on nodes with high Pod density, during concurrent reclamations, or when using long termination grace periods. For Spot workloads, we recommend designing applications to be interruption-tolerant, using termination grace periods of 30 seconds or less, avoiding long-running preStop hooks, and monitoring Pod eviction metrics to understand actual grace periods in your clusters. For workloads requiring guaranteed graceful termination, we recommend using On-Demand capacity instead.

When deciding whether to deploy a node group with On-Demand or Spot capacity, you should consider the following conditions:
+ Spot Instances are a good fit for stateless, fault-tolerant, flexible applications. These include batch and machine learning training workloads, big data ETLs such as Apache Spark, queue processing applications, and stateless API endpoints. Because Spot is spare Amazon EC2 capacity, which can change over time, we recommend that you use Spot capacity for interruption-tolerant workloads. More specifically, Spot capacity is suitable for workloads that can tolerate periods where the required capacity isn’t available.
+ We recommend that you use On-Demand for applications that are fault intolerant. This includes cluster management tools such as monitoring and operational tools, deployments that require `StatefulSets`, and stateful applications, such as databases.
+ To maximize the availability of your applications while using Spot Instances, we recommend that you configure a Spot managed node group to use multiple instance types. We recommend applying the following rules when using multiple instance types:
  + Within a managed node group, if you’re using the [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md), we recommend using a flexible set of instance types with the same amount of vCPU and memory resources. This is to ensure that the nodes in your cluster scale as expected. For example, if you need four vCPUs and eight GiB memory, use `c3.xlarge`, `c4.xlarge`, `c5.xlarge`, `c5d.xlarge`, `c5a.xlarge`, `c5n.xlarge`, or other similar instance types.
  + To enhance application availability, we recommend deploying multiple Spot managed node groups. For this, each group should use a flexible set of instance types that have the same vCPU and memory resources. For example, if you need 4 vCPUs and 8 GiB memory, we recommend that you create one managed node group with `c3.xlarge`, `c4.xlarge`, `c5.xlarge`, `c5d.xlarge`, `c5a.xlarge`, `c5n.xlarge`, or other similar instance types, and a second managed node group with `m3.xlarge`, `m4.xlarge`, `m5.xlarge`, `m5d.xlarge`, `m5a.xlarge`, `m5n.xlarge` or other similar instance types.
  + When deploying your node group with the Spot capacity type that’s using a custom launch template, use the API to pass multiple instance types. Don’t pass a single instance type through the launch template. For more information about deploying a node group using a launch template, see [Customize managed nodes with launch templates](launch-templates.md).

# Create a managed node group for your cluster
<a name="create-managed-node-group"></a>

This topic describes how you can launch Amazon EKS managed node groups of nodes that register with your Amazon EKS cluster. After the nodes join the cluster, you can deploy Kubernetes applications to them.

If this is your first time launching an Amazon EKS managed node group, we recommend that you instead follow one of our guides in [Get started with Amazon EKS](getting-started.md). These guides provide walkthroughs for creating an Amazon EKS cluster with nodes.

**Important**  
Amazon EKS nodes are standard Amazon EC2 instances. You’re billed based on the normal Amazon EC2 prices. For more information, see [Amazon EC2 Pricing](https://aws.amazon.com/ec2/pricing/).
You can’t create managed nodes in an AWS Region where you have AWS Outposts or AWS Wavelength enabled. You can create self-managed nodes instead. For more information, see [Create self-managed Amazon Linux nodes](launch-workers.md), [Create self-managed Microsoft Windows nodes](launch-windows-workers.md), and [Create self-managed Bottlerocket nodes](launch-node-bottlerocket.md). You can also create a self-managed Amazon Linux node group on an Outpost. For more information, see [Create Amazon Linux nodes on AWS Outposts](eks-outposts-self-managed-nodes.md).
If you don’t [specify an AMI ID](launch-templates.md#launch-template-custom-ami) for the `bootstrap.sh` file included with Amazon EKS optimized Linux or Bottlerocket, managed node groups enforce a maximum number on the value of `maxPods`. For instances with less than 30 vCPUs, the maximum number is `110`. For instances with greater than 30 vCPUs, the maximum number jumps to `250`. This enforcement overrides other `maxPods` configurations, including `maxPodsExpression`. For more information about how `maxPods` is determined and how to customize it, see [How maxPods is determined](choosing-instance-type.md#max-pods-precedence).
+ An existing Amazon EKS cluster. To deploy one, see [Create an Amazon EKS cluster](create-cluster.md).
+ An existing IAM role for the nodes to use. To create one, see [Amazon EKS node IAM role](create-node-role.md). If this role doesn’t have either of the policies for the VPC CNI, the separate role that follows is required for the VPC CNI pods.
+ (Optional, but recommended) The Amazon VPC CNI plugin for Kubernetes add-on configured with its own IAM role that has the necessary IAM policy attached to it. For more information, see [Configure Amazon VPC CNI plugin to use IRSA](cni-iam-role.md).
+ Familiarity with the considerations listed in [Choose an optimal Amazon EC2 node instance type](choosing-instance-type.md). Depending on the instance type you choose, there may be additional prerequisites for your cluster and VPC.
+ To add a Windows managed node group, you must first enable Windows support for your cluster. For more information, see [Deploy Windows nodes on EKS clusters](windows-support.md).

You can create a managed node group with either of the following:
+  [`eksctl`](#eksctl_create_managed_nodegroup) 
+  [AWS Management Console](#console_create_managed_nodegroup) 

## `eksctl`
<a name="eksctl_create_managed_nodegroup"></a>

 **Create a managed node group with eksctl** 

This procedure requires `eksctl` version `0.215.0` or later. You can check your version with the following command:

```
eksctl version
```

For instructions on how to install or upgrade `eksctl`, see [Installation](https://eksctl.io/installation) in the `eksctl` documentation.

1. (Optional) If the **AmazonEKS\$1CNI\$1Policy** managed IAM policy is attached to your [Amazon EKS node IAM role](create-node-role.md), we recommend assigning it to an IAM role that you associate to the Kubernetes `aws-node` service account instead. For more information, see [Configure Amazon VPC CNI plugin to use IRSA](cni-iam-role.md).

1. Create a managed node group with or without using a custom launch template. Manually specifying a launch template allows for greater customization of a node group. For example, it can allow deploying a custom AMI or providing arguments to the `boostrap.sh` script in an Amazon EKS optimized AMI. For a complete list of every available option and default, enter the following command.

   ```
   eksctl create nodegroup --help
   ```

   In the following command, replace *my-cluster* with the name of your cluster and replace *my-mng* with the name of your node group. The node group name can’t be longer than 63 characters. It must start with letter or digit, but can also include hyphens and underscores for the remaining characters.
**Important**  
If you don’t use a custom launch template when first creating a managed node group, don’t use one at a later time for the node group. If you didn’t specify a custom launch template, the system auto-generates a launch template that we don’t recommend that you modify manually. Manually modifying this auto-generated launch template might cause errors.

 **Without a launch template** 

 `eksctl` creates a default Amazon EC2 launch template in your account and deploys the node group using a launch template that it creates based on options that you specify. Before specifying a value for `--node-type`, see [Choose an optimal Amazon EC2 node instance type](choosing-instance-type.md).

Replace *ami-family* with an allowed keyword. For more information, see [Setting the node AMI Family](https://eksctl.io/usage/custom-ami-support/#setting-the-node-ami-family) in the `eksctl` documentation. Replace *my-key* with the name of your Amazon EC2 key pair or public key. This key is used to SSH into your nodes after they launch.

**Note**  
For Windows, this command doesn’t enable SSH. Instead, it associates your Amazon EC2 key pair with the instance and allows you to RDP into the instance.

If you don’t already have an Amazon EC2 key pair, you can create one in the AWS Management Console. For Linux information, see [Amazon EC2 key pairs and Linux instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*. For Windows information, see [Amazon EC2 key pairs and Windows instances](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*.

We recommend blocking Pod access to IMDS if the following conditions are true:
+ You plan to assign IAM roles to all of your Kubernetes service accounts so that Pods only have the minimum permissions that they need.
+ No Pods in the cluster require access to the Amazon EC2 instance metadata service (IMDS) for other reasons, such as retrieving the current AWS Region.

For more information, see [Restrict access to the instance profile assigned to the worker node](https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node).

If you want to block Pod access to IMDS, then add the `--disable-pod-imds` option to the following command.

```
eksctl create nodegroup \
  --cluster my-cluster \
  --region region-code \
  --name my-mng \
  --node-ami-family ami-family \
  --node-type m5.large \
  --nodes 3 \
  --nodes-min 2 \
  --nodes-max 4 \
  --ssh-access \
  --ssh-public-key my-key
```

Your instances can optionally assign a significantly higher number of IP addresses to Pods, assign IP addresses to Pods from a different CIDR block than the instance’s, and be deployed to a cluster without internet access. For more information, see [Assign more IP addresses to Amazon EKS nodes with prefixes](cni-increase-ip-addresses.md), [Deploy Pods in alternate subnets with custom networking](cni-custom-network.md), and [Deploy private clusters with limited internet access](private-clusters.md) for additional options to add to the previous command.

Managed node groups calculates and applies a single value for the maximum number of Pods that can run on each node of your node group, based on instance type. If you create a node group with different instance types, the smallest value calculated across all instance types is applied as the maximum number of Pods that can run on every instance type in the node group. For more information about how this value is calculated, see [How maxPods is determined](choosing-instance-type.md#max-pods-precedence).

 **With a launch template** 

The launch template must already exist and must meet the requirements specified in [Launch template configuration basics](launch-templates.md#launch-template-basics). We recommend blocking Pod access to IMDS if the following conditions are true:
+ You plan to assign IAM roles to all of your Kubernetes service accounts so that Pods only have the minimum permissions that they need.
+ No Pods in the cluster require access to the Amazon EC2 instance metadata service (IMDS) for other reasons, such as retrieving the current AWS Region.

For more information, see [Restrict access to the instance profile assigned to the worker node](https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node).

If you want to block Pod access to IMDS, then specify the necessary settings in the launch template.

1. Copy the following contents to your device. Replace the example values and then run the modified command to create the `eks-nodegroup.yaml` file. Several settings that you specify when deploying without a launch template are moved into the launch template. If you don’t specify a `version`, the template’s default version is used.

   ```
   cat >eks-nodegroup.yaml <<EOF
   apiVersion: eksctl.io/v1alpha5
   kind: ClusterConfig
   metadata:
     name: my-cluster
     region: region-code
   managedNodeGroups:
   - name: my-mng
     launchTemplate:
       id: lt-id
       version: "1"
   EOF
   ```

   For a complete list of `eksctl` config file settings, see [Config file schema](https://eksctl.io/usage/schema/) in the `eksctl` documentation. Your instances can optionally assign a significantly higher number of IP addresses to Pods, assign IP addresses to Pods from a different CIDR block than the instance’s, and be deployed to a cluster without outbound internet access. For more information, see [Assign more IP addresses to Amazon EKS nodes with prefixes](cni-increase-ip-addresses.md), [Deploy Pods in alternate subnets with custom networking](cni-custom-network.md), and [Deploy private clusters with limited internet access](private-clusters.md) for additional options to add to the config file.

   If you didn’t specify an AMI ID in your launch template, managed node groups calculates and applies a single value for the maximum number of Pods that can run on each node of your node group, based on instance type. If you create a node group with different instance types, the smallest value calculated across all instance types is applied as the maximum number of Pods that can run on every instance type in the node group. For more information about how this value is calculated, see [How maxPods is determined](choosing-instance-type.md#max-pods-precedence).

   If you specified an AMI ID in your launch template, specify the maximum number of Pods that can run on each node of your node group if you’re using [custom networking](cni-custom-network.md) or want to [increase the number of IP addresses assigned to your instance](cni-increase-ip-addresses.md). For more information, see [How maxPods is determined](choosing-instance-type.md#max-pods-precedence).

1. Deploy the nodegroup with the following command.

   ```
   eksctl create nodegroup --config-file eks-nodegroup.yaml
   ```

## AWS Management Console
<a name="console_create_managed_nodegroup"></a>

 **Create a managed node group using the AWS Management Console ** 

1. Wait for your cluster status to show as `ACTIVE`. You can’t create a managed node group for a cluster that isn’t already `ACTIVE`.

1. Open the [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. Choose the name of the cluster that you want to create a managed node group in.

1. Select the **Compute** tab.

1. Choose **Add node group**.

1. On the **Configure node group** page, fill out the parameters accordingly, and then choose **Next**.
   +  **Name** – Enter a unique name for your managed node group. The node group name can’t be longer than 63 characters. It must start with letter or digit, but can also include hyphens and underscores for the remaining characters.
   +  **Node IAM role** – Choose the node instance role to use with your node group. For more information, see [Amazon EKS node IAM role](create-node-role.md).
**Important**  
You can’t use the same role that is used to create any clusters.
We recommend using a role that’s not currently in use by any self-managed node group. Otherwise, you plan to use with a new self-managed node group. For more information, see [Delete a managed node group from your cluster](delete-managed-node-group.md).
   +  **Use launch template** – (Optional) Choose if you want to use an existing launch template. Select a **Launch Template Name**. Then, select a **Launch template version**. If you don’t select a version, then Amazon EKS uses the template’s default version. Launch templates allow for more customization of your node group, such as allowing you to deploy a custom AMI, assign a significantly higher number of IP addresses to Pods, assign IP addresses to Pods from a different CIDR block than the instance’s, and deploying nodes to a cluster without outbound internet access. For more information, see [Assign more IP addresses to Amazon EKS nodes with prefixes](cni-increase-ip-addresses.md), [Deploy Pods in alternate subnets with custom networking](cni-custom-network.md), and [Deploy private clusters with limited internet access](private-clusters.md).

     The launch template must meet the requirements in [Customize managed nodes with launch templates](launch-templates.md). If you don’t use your own launch template, the Amazon EKS API creates a default Amazon EC2 launch template in your account and deploys the node group using the default launch template.

     If you implement [IAM roles for service accounts](iam-roles-for-service-accounts.md), assign necessary permissions directly to every Pod that requires access to AWS services, and no Pods in your cluster require access to IMDS for other reasons, such as retrieving the current AWS Region, then you can also disable access to IMDS for Pods that don’t use host networking in a launch template. For more information, see [Restrict access to the instance profile assigned to the worker node](https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node).
   +  **Kubernetes labels** – (Optional) You can choose to apply Kubernetes labels to the nodes in your managed node group.
   +  **Kubernetes taints** – (Optional) You can choose to apply Kubernetes taints to the nodes in your managed node group. The available options in the **Effect** menu are ` NoSchedule `, ` NoExecute `, and ` PreferNoSchedule `. For more information, see [Recipe: Prevent pods from being scheduled on specific nodes](node-taints-managed-node-groups.md).
   +  **Tags** – (Optional) You can choose to tag your Amazon EKS managed node group. These tags don’t propagate to other resources in the node group, such as Auto Scaling groups or instances. For more information, see [Organize Amazon EKS resources with tags](eks-using-tags.md).

1. On the **Set compute and scaling configuration** page, fill out the parameters accordingly, and then choose **Next**.
   +  **AMI type** – Select an AMI type. If you are deploying Arm instances, be sure to review the considerations in [Amazon EKS optimized Arm Amazon Linux AMIs](eks-optimized-ami.md#arm-ami) before deploying.

     If you specified a launch template on the previous page, and specified an AMI in the launch template, then you can’t select a value. The value from the template is displayed. The AMI specified in the template must meet the requirements in [Specifying an AMI](launch-templates.md#launch-template-custom-ami).
   +  **Capacity type** – Select a capacity type. For more information about choosing a capacity type, see [Managed node group capacity types](managed-node-groups.md#managed-node-group-capacity-types). You can’t mix different capacity types within the same node group. If you want to use both capacity types, create separate node groups, each with their own capacity and instance types. See [Reserve GPUs for managed node groups](https://docs.aws.amazon.com/eks/latest/userguide/capacity-blocks-mng.html) for information on provisioning and scaling GPU-accelerated worker nodes.
   +  **Instance types** – By default, one or more instance type is specified. To remove a default instance type, select the `X` on the right side of the instance type. Choose the instance types to use in your managed node group. For more information, see [Choose an optimal Amazon EC2 node instance type](choosing-instance-type.md).

     The console displays a set of commonly used instance types. If you need to create a managed node group with an instance type that’s not displayed, then use `eksctl`, the AWS CLI, AWS CloudFormation, or an SDK to create the node group. If you specified a launch template on the previous page, then you can’t select a value because the instance type must be specified in the launch template. The value from the launch template is displayed. If you selected **Spot** for **Capacity type**, then we recommend specifying multiple instance types to enhance availability.
   +  **Disk size** – Enter the disk size (in GiB) to use for your node’s root volume.

     If you specified a launch template on the previous page, then you can’t select a value because it must be specified in the launch template.
   +  **Desired size** – Specify the current number of nodes that the managed node group should maintain at launch.
**Note**  
Amazon EKS doesn’t automatically scale your node group in or out. However, you can configure the Kubernetes Cluster Autoscaler to do this for you. For more information, see [Cluster Autoscaler on AWS](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md).
   +  **Minimum size** – Specify the minimum number of nodes that the managed node group can scale in to.
   +  **Maximum size** – Specify the maximum number of nodes that the managed node group can scale out to.
   +  **Node group update configuration** – (Optional) You can select the number or percentage of nodes to be updated in parallel. These nodes will be unavailable during the update. For **Maximum unavailable**, select one of the following options and specify a **Value**:
     +  **Number** – Select and specify the number of nodes in your node group that can be updated in parallel.
     +  **Percentage** – Select and specify the percentage of nodes in your node group that can be updated in parallel. This is useful if you have a large number of nodes in your node group.
   +  **Node auto repair configuration** – (Optional) If you activate the **Enable node auto repair** checkbox, Amazon EKS will automatically replace nodes when detected issues occur. For more information, see [Detect node health issues and enable automatic node repair](node-health.md).
   +  **Warm pool configuration** – (Optional) If you activate the **Enable warm pool configuration** checkbox, Amazon EKS will create warm pools on the ASG. For more information, see [Decrease latency for applications with long boot times using warm pools with managed node groups](warm-pools-managed-node-groups.md).

1. On the **Specify networking** page, fill out the parameters accordingly, and then choose **Next**.
   +  **Subnets** – Choose the subnets to launch your managed nodes into.
**Important**  
If you are running a stateful application across multiple Availability Zones that is backed by Amazon EBS volumes and using the Kubernetes [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md), you should configure multiple node groups, each scoped to a single Availability Zone. In addition, you should enable the `--balance-similar-node-groups` feature.
**Important**  
If you choose a public subnet, and your cluster has only the public API server endpoint enabled, then the subnet must have `MapPublicIPOnLaunch` set to `true` for the instances to successfully join a cluster. If the subnet was created using `eksctl` or the [Amazon EKS vended AWS CloudFormation templates](creating-a-vpc.md) on or after March 26, 2020, then this setting is already set to `true`. If the subnets were created with `eksctl` or the AWS CloudFormation templates before March 26, 2020, then you need to change the setting manually. For more information, see [Modifying the public IPv4 addressing attribute for your subnet](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-ip-addressing.html#subnet-public-ip).
If you use a launch template and specify multiple network interfaces, Amazon EC2 won’t auto-assign a public `IPv4` address, even if `MapPublicIpOnLaunch` is set to `true`. For nodes to join the cluster in this scenario, you must either enable the cluster’s private API server endpoint, or launch nodes in a private subnet with outbound internet access provided through an alternative method, such as a NAT Gateway. For more information, see [Amazon EC2 instance IP addressing](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html) in the *Amazon EC2 User Guide*.
   +  **Configure SSH access to nodes** (Optional). Enabling SSH allows you to connect to your instances and gather diagnostic information if there are issues. We highly recommend enabling remote access when you create a node group. You can’t enable remote access after the node group is created.

     If you chose to use a launch template, then this option isn’t shown. To enable remote access to your nodes, specify a key pair in the launch template and ensure that the proper port is open to the nodes in the security groups that you specify in the launch template. For more information, see [Using custom security groups](launch-templates.md#launch-template-security-groups).
**Note**  
For Windows, this command doesn’t enable SSH. Instead, it associates your Amazon EC2 key pair with the instance and allows you to RDP into the instance.
   + For **SSH key pair** (Optional), choose an Amazon EC2 SSH key to use. For Linux information, see [Amazon EC2 key pairs and Linux instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*. For Windows information, see [Amazon EC2 key pairs and Windows instances](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*. If you chose to use a launch template, then you can’t select one. When an Amazon EC2 SSH key is provided for node groups using Bottlerocket AMIs, the administrative container is also enabled. For more information, see [Admin container](https://github.com/bottlerocket-os/bottlerocket#admin-container) on GitHub.
   + For **Allow SSH remote access from**, if you want to limit access to specific instances, then select the security groups that are associated to those instances. If you don’t select specific security groups, then SSH access is allowed from anywhere on the internet (`0.0.0.0/0`).

1. On the **Review and create** page, review your managed node group configuration and choose **Create**.

   If nodes fail to join the cluster, then see [Nodes fail to join cluster](troubleshooting.md#worker-node-fail) in the Troubleshooting chapter.

1. Watch the status of your nodes and wait for them to reach the `Ready` status.

   ```
   kubectl get nodes --watch
   ```

1. (GPU nodes only) If you chose a GPU instance type and an Amazon EKS optimized accelerated AMI, then you must apply the [NVIDIA device plugin for Kubernetes](https://github.com/NVIDIA/k8s-device-plugin) as a DaemonSet on your cluster. Replace *vX.X.X* with your desired [NVIDIA/k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin/releases) version before running the following command.

   ```
   kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/vX.X.X/deployments/static/nvidia-device-plugin.yml
   ```

## Install Kubernetes add-ons
<a name="_install_kubernetes_add_ons"></a>

Now that you have a working Amazon EKS cluster with nodes, you’re ready to start installing Kubernetes add-ons and deploying applications to your cluster. The following documentation topics help you to extend the functionality of your cluster.
+ The [IAM principal](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html#iam-term-principal) that created the cluster is the only principal that can make calls to the Kubernetes API server with `kubectl` or the AWS Management Console. If you want other IAM principals to have access to your cluster, then you need to add them. For more information, see [Grant IAM users and roles access to Kubernetes APIs](grant-k8s-access.md) and [Required permissions](view-kubernetes-resources.md#view-kubernetes-resources-permissions).
+ We recommend blocking Pod access to IMDS if the following conditions are true:
  + You plan to assign IAM roles to all of your Kubernetes service accounts so that Pods only have the minimum permissions that they need.
  + No Pods in the cluster require access to the Amazon EC2 instance metadata service (IMDS) for other reasons, such as retrieving the current AWS Region.

  For more information, see [Restrict access to the instance profile assigned to the worker node](https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node).
+ Configure the Kubernetes [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md) to automatically adjust the number of nodes in your node groups.
+ Deploy a [sample application](sample-deployment.md) to your cluster.
+  [Organize and monitor cluster resources](eks-managing.md) with important tools for managing your cluster.

# Decrease latency for applications with long boot times using warm pools with managed node groups
<a name="warm-pools-managed-node-groups"></a>

When your applications have long initialization or boot times, scale-out events can cause delays—new nodes must fully boot and join the cluster before Pods can be scheduled on them. This latency can impact application availability during traffic spikes or rapid scaling events. Warm pools solve this problem by maintaining a pool of pre-initialized EC2 instances that have already completed the bootup process. During a scale-out event, instances move from the warm pool directly to your cluster, bypassing the time-consuming initialization steps and significantly reducing the time it takes for new capacity to become available. For more information, see [Decrease latency for applications that have long boot times using warm pools](https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-warm-pools.html) in the *Amazon EC2 Auto Scaling User Guide*.

Amazon EKS managed node groups support Amazon EC2 Auto Scaling warm pools. A warm pool maintains pre-initialized EC2 instances alongside your Auto Scaling group that can quickly join your cluster during scale-out events. Instances in the warm pool have already completed the bootup initialization process and can be kept in a `Stopped`, `Running`, or `Hibernated` state.

Amazon EKS manages warm pools throughout the node group lifecycle using the `AWSServiceRoleForAmazonEKSNodegroup` service-linked role to create, update, and delete warm pool resources.

## How it works
<a name="warm-pools-how-it-works"></a>

When you configure a warm pool, Amazon EKS creates an EC2 Auto Scaling warm pool attached to your node group’s Auto Scaling group. Instances launch into the warm pool, complete the bootup initialization process, and remain in the configured state (`Running`, `Stopped`, or `Hibernated`) until needed. During scale-out events, instances move from the warm pool to the Auto Scaling group, complete the Amazon EKS initialization process to join the cluster, and become available for pod scheduling. With instance reuse enabled, instances can return to the warm pool during scale-in events.

**Important**  
Always configure warm pools through the Amazon EKS API using `create-nodegroup` or `update-nodegroup-config`. Don’t manually modify warm pool settings using the EC2 Auto Scaling API, as this can cause conflicts with Amazon EKS management of the resources.

## Considerations
<a name="warm-pools-considerations"></a>

**Important**  
Before configuring warm pools, review the prerequisites and limitations in [Warm pools for Amazon EC2 Auto Scaling](https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-warm-pools.html) in the *Amazon EC2 Auto Scaling User Guide*. Not all instance types, AMIs, or configurations are supported.
+  **IAM permissions** – The `AWSServiceRoleForAmazonEKSNodegroup` service-linked role (created automatically with your first managed node group) includes the necessary warm pool management permissions.
+  **AMI limitations** – Warm pools don’t support custom AMIs. You must use Amazon EKS optimized AMIs.
+  **Bottlerocket limitations** – If using Bottlerocket AMIs, the `Hibernated` pool state isn’t supported. Use `Stopped` or `Running` pool states only. Additionally, the `reuseOnScaleIn` feature isn’t supported with Bottlerocket AMIs.
+  **Hibernation support** – The `Hibernated` pool state is only supported on specific instance types. See [Hibernation prerequisites](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/hibernating-prerequisites.html) in the *Amazon EC2 User Guide* for supported instance types.
+  **Cost impact** – Creating a warm pool when it’s not required can lead to unnecessary costs.
+  **Capacity planning** – Size your warm pool based on scaling patterns to balance cost and availability. Start with 10-20% of expected peak capacity.
+  **VPC networking** – Ensure sufficient IP addresses for both Auto Scaling group and warm pool instances.

## Configure warm pools
<a name="warm-pools-configuration"></a>

You can configure warm pools when creating a new managed node group or update an existing managed node group to add warm pool support.

### Configuration parameters
<a name="warm-pools-parameters"></a>
+  **enabled** – (boolean) Indicates your intent to attach a warm pool to the managed node group. Required to enable warm pool support.
+  **maxGroupPreparedCapacity** – (integer) Maximum total instances across warm pool and Auto Scaling group combined.
+  **minSize** – (integer) Minimum number of instances to maintain in the warm pool. Default: `0`.
+  **poolState** – (string) State for warm pool instances. Default: `Stopped`.
+  **reuseOnScaleIn** – (boolean) Whether instances return to the warm pool during scale-in events instead of terminating them. Default: `false`. Not supported with Bottlerocket AMIs.

### Using the AWS CLI
<a name="warm-pools-create-cli"></a>

You can configure a warm pool when creating a managed node group or add one to an existing node group.

 **Create a node group with a warm pool** 

```
aws eks create-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --node-role arn:aws:iam::111122223333:role/AmazonEKSNodeRole \
  --subnets subnet-12345678 subnet-87654321 \
  --region us-east-1 \
  --scaling-config minSize=2,maxSize=10,desiredSize=3 \
  --warm-pool-config enabled=true,maxGroupPreparedCapacity=8,minSize=2,poolState=Stopped,reuseOnScaleIn=true
```

 **Add a warm pool to an existing node group** 

```
aws eks update-nodegroup-config \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --region us-east-1 \
  --warm-pool-config enabled=true,maxGroupPreparedCapacity=8,minSize=2,poolState=Stopped,reuseOnScaleIn=true
```

## Update configuration
<a name="warm-pools-update"></a>

Update warm pool settings at any time using `update-nodegroup-config`. Existing warm pool instances aren’t immediately affected; new settings apply to instances entering the warm pool after the update.

```
aws eks update-nodegroup-config \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --region us-east-1 \
  --warm-pool-config enabled=true,maxGroupPreparedCapacity=10,minSize=3,poolState=Running,reuseOnScaleIn=true
```

To disable the warm pool attached to your nodegroup, set `enabled=false`:

```
aws eks update-nodegroup-config \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --region us-east-1 \
  --warm-pool-config enabled=false
```

## Additional resources
<a name="warm-pools-additional-resources"></a>
+  [Warm pools for Amazon EC2 Auto Scaling](https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-warm-pools.html) in the *Amazon EC2 Auto Scaling User Guide* 
+  [Simplify node lifecycle with managed node groups](managed-node-groups.md) 

# Update a managed node group for your cluster
<a name="update-managed-node-group"></a>

When you initiate a managed node group update, Amazon EKS automatically updates your nodes for you, completing the steps listed in [Understand each phase of node updates](managed-node-update-behavior.md). If you’re using an Amazon EKS optimized AMI, Amazon EKS automatically applies the latest security patches and operating system updates to your nodes as part of the latest AMI release version.

There are several scenarios where it’s useful to update your Amazon EKS managed node group’s version or configuration:
+ You have updated the Kubernetes version for your Amazon EKS cluster and want to update your nodes to use the same Kubernetes version.
+ A new AMI release version is available for your managed node group. For more information about AMI versions, see these sections:
  +  [Retrieve Amazon Linux AMI version information](eks-linux-ami-versions.md) 
  +  [Create nodes with optimized Bottlerocket AMIs](eks-optimized-ami-bottlerocket.md) 
  +  [Retrieve Windows AMI version information](eks-ami-versions-windows.md) 
+ You want to adjust the minimum, maximum, or desired count of the instances in your managed node group.
+ You want to add or remove Kubernetes labels from the instances in your managed node group.
+ You want to add or remove AWS tags from your managed node group.
+ You need to deploy a new version of a launch template with configuration changes, such as an updated custom AMI.
+ You have deployed version `1.9.0` or later of the Amazon VPC CNI add-on, enabled the add-on for prefix delegation, and want new AWS Nitro System instances in a node group to support a significantly increased number of Pods. For more information, see [Assign more IP addresses to Amazon EKS nodes with prefixes](cni-increase-ip-addresses.md).
+ You have enabled IP prefix delegation for Windows nodes and want new AWS Nitro System instances in a node group to support a significantly increased number of Pods. For more information, see [Assign more IP addresses to Amazon EKS nodes with prefixes](cni-increase-ip-addresses.md).

If there’s a newer AMI release version for your managed node group’s Kubernetes version, you can update your node group’s version to use the newer AMI version. Similarly, if your cluster is running a Kubernetes version that’s newer than your node group, you can update the node group to use the latest AMI release version to match your cluster’s Kubernetes version.

When a node in a managed node group is terminated due to a scaling operation or update, the Pods in that node are drained first. For more information, see [Understand each phase of node updates](managed-node-update-behavior.md).

## Update a node group version
<a name="mng-update"></a>

You can update a node group version with either of the following:
+  [`eksctl`](#eksctl_update_managed_nodegroup) 
+  [AWS Management Console](#console_update_managed_nodegroup) 

The version that you update to can’t be greater than the control plane’s version.

## `eksctl`
<a name="eksctl_update_managed_nodegroup"></a>

 **Update a managed node group using `eksctl` ** 

Update a managed node group to the latest AMI release of the same Kubernetes version that’s currently deployed on the nodes with the following command. Replace every *example value* with your own values.

```
eksctl upgrade nodegroup \
  --name=node-group-name \
  --cluster=my-cluster \
  --region=region-code
```

**Note**  
If you’re upgrading a node group that’s deployed with a launch template to a new launch template version, add `--launch-template-version version-number ` to the preceding command. The launch template must meet the requirements described in [Customize managed nodes with launch templates](launch-templates.md). If the launch template includes a custom AMI, the AMI must meet the requirements in [Specifying an AMI](launch-templates.md#launch-template-custom-ami). When you upgrade your node group to a newer version of your launch template, every node is recycled to match the new configuration of the launch template version that’s specified.

You can’t directly upgrade a node group that’s deployed without a launch template to a new launch template version. Instead, you must deploy a new node group using the launch template to update the node group to a new launch template version.

You can upgrade a node group to the same version as the control plane’s Kubernetes version. For example, if you have a cluster running Kubernetes `1.35`, you can upgrade nodes currently running Kubernetes `1.34` to version `1.35` with the following command.

```
eksctl upgrade nodegroup \
  --name=node-group-name \
  --cluster=my-cluster \
  --region=region-code \
  --kubernetes-version=1.35
```

## AWS Management Console
<a name="console_update_managed_nodegroup"></a>

 **Update a managed node group using the AWS Management Console ** 

1. Open the [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. Choose the cluster that contains the node group to update.

1. If at least one node group has an available update, a box appears at the top of the page notifying you of the available update. If you select the **Compute** tab, you’ll see **Update now** in the **AMI release version** column in the **Node groups** table for the node group that has an available update. To update the node group, choose **Update now**.

   You won’t see a notification for node groups that were deployed with a custom AMI. If your nodes are deployed with a custom AMI, complete the following steps to deploy a new updated custom AMI.

   1. Create a new version of your AMI.

   1. Create a new launch template version with the new AMI ID.

   1. Upgrade the nodes to the new version of the launch template.

1. On the **Update node group version** dialog box, activate or deactivate the following options:
   +  **Update node group version** – This option is unavailable if you deployed a custom AMI or your Amazon EKS optimized AMI is currently on the latest version for your cluster.
   +  **Change launch template version** – This option is unavailable if the node group is deployed without a custom launch template. You can only update the launch template version for a node group that has been deployed with a custom launch template. Select the **Launch template version** that you want to update the node group to. If your node group is configured with a custom AMI, then the version that you select must also specify an AMI. When you upgrade to a newer version of your launch template, every node is recycled to match the new configuration of the launch template version specified.

1. For **Update strategy**, select one of the following options:
   +  **Rolling update** – This option respects the Pod disruption budgets for your cluster. Updates fail if there’s a Pod disruption budget issue that causes Amazon EKS to be unable to gracefully drain the Pods that are running on this node group.
   +  **Force update** – This option doesn’t respect Pod disruption budgets. Updates occur regardless of Pod disruption budget issues by forcing node restarts to occur.

1. Choose **Update**.

## Edit a node group configuration
<a name="mng-edit"></a>

You can modify some of the configurations of a managed node group.

1. Open the [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. Choose the cluster that contains the node group to edit.

1. Select the **Compute** tab.

1. Select the node group to edit, and then choose **Edit**.

1. (Optional) On the **Edit node group** page, do the following:

   1. Edit the **Node group scaling configuration**.
      +  **Desired size** – Specify the current number of nodes that the managed node group should maintain.
      +  **Minimum size** – Specify the minimum number of nodes that the managed node group can scale in to.
      +  **Maximum size** – Specify the maximum number of nodes that the managed node group can scale out to. For the maximum number of nodes supported in a node group, see [View and manage Amazon EKS and Fargate service quotas](service-quotas.md).

   1. (Optional) Add or remove **Kubernetes labels** to the nodes in your node group. The labels shown here are only the labels that you have applied with Amazon EKS. Other labels may exist on your nodes that aren’t shown here.

   1. (Optional) Add or remove **Kubernetes taints** to the nodes in your node group. Added taints can have the effect of either ` NoSchedule `, ` NoExecute `, or ` PreferNoSchedule `. For more information, see [Recipe: Prevent pods from being scheduled on specific nodes](node-taints-managed-node-groups.md).

   1. (Optional) Add or remove **Tags** from your node group resource. These tags are only applied to the Amazon EKS node group. They don’t propagate to other resources, such as subnets or Amazon EC2 instances in the node group.

   1. (Optional) Edit the **Node Group update configuration**. Select either **Number** or **Percentage**.
      +  **Number** – Select and specify the number of nodes in your node group that can be updated in parallel. These nodes will be unavailable during update.
      +  **Percentage** – Select and specify the percentage of nodes in your node group that can be updated in parallel. These nodes will be unavailable during update. This is useful if you have many nodes in your node group.

   1. When you’re finished editing, choose **Save changes**.

**Important**  
When updating the node group configuration, modifying the [https://docs.aws.amazon.com/eks/latest/APIReference/API_NodegroupScalingConfig.html](https://docs.aws.amazon.com/eks/latest/APIReference/API_NodegroupScalingConfig.html) does not respect Pod disruption budgets (PDBs). Unlike the [update node group](managed-node-update-behavior.md) process (which drains nodes and respects PDBs during the upgrade phase), updating the scaling configuration causes nodes to be terminated immediately through an Auto Scaling Group (ASG) scale-down call. This happens without considering PDBs, regardless of the target size you’re scaling down to. That means when you reduce the `desiredSize` of an Amazon EKS managed node group, Pods are evicted as soon as the nodes are terminated, without honoring any PDBs.

# Understand each phase of node updates
<a name="managed-node-update-behavior"></a>

The Amazon EKS managed worker node upgrade strategy has four different phases described in the following sections.

## Setup phase
<a name="managed-node-update-set-up"></a>

The setup phase has these steps:

1. It creates a new Amazon EC2 launch template version for the Auto Scaling Group that’s associated with your node group. The new launch template version uses the target AMI or a custom launch template version for the update.

1. It updates the Auto Scaling Group to use the latest launch template version.

1. It determines the maximum quantity of nodes to upgrade in parallel using the `updateConfig` property for the node group. The maximum unavailable has a quota of 100 nodes. The default value is one node. For more information, see the [updateConfig](https://docs.aws.amazon.com/eks/latest/APIReference/API_UpdateNodegroupConfig.html#API_UpdateNodegroupConfig_RequestSyntax) property in the *Amazon EKS API Reference*.

## Scale up phase
<a name="managed-node-update-scale-up"></a>

When upgrading the nodes in a managed node group, the upgraded nodes are launched in the same Availability Zone as those that are being upgraded. To guarantee this placement, we use Amazon EC2’s Availability Zone Rebalancing. For more information, see [Availability Zone Rebalancing](https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-benefits.html#AutoScalingBehavior.InstanceUsage) in the *Amazon EC2 Auto Scaling User Guide*. To meet this requirement, it’s possible that we’d launch up to two instances per Availability Zone in your managed node group.

The scale up phase has these steps:

1. It increments the Auto Scaling Group’s maximum size and desired size by the larger of either:
   + Up to twice the number of Availability Zones that the Auto Scaling Group is deployed in.
   + The maximum unavailable of upgrade.

     For example, if your node group has five Availability Zones and `maxUnavailable` as one, the upgrade process can launch a maximum of 10 nodes. However when `maxUnavailable` is 20 (or anything higher than 10), the process would launch 20 new nodes.

1. After scaling the Auto Scaling Group, it checks if the nodes using the latest configuration are present in the node group. This step succeeds only when it meets these criteria:
   + At least one new node is launched in every Availability Zone where the node exists.
   + Every new node should be in `Ready` state.
   + New nodes should have Amazon EKS applied labels.

     These are the Amazon EKS applied labels on the worker nodes in a regular node group:
     +  `eks.amazonaws.com/nodegroup-image=$amiName` 
     +  `eks.amazonaws.com/nodegroup=$nodeGroupName` 

     These are the Amazon EKS applied labels on the worker nodes in a custom launch template or AMI node group:
     +  `eks.amazonaws.com/nodegroup-image=$amiName` 
     +  `eks.amazonaws.com/nodegroup=$nodeGroupName` 
     +  `eks.amazonaws.com/sourceLaunchTemplateId=$launchTemplateId` 
     +  `eks.amazonaws.com/sourceLaunchTemplateVersion=$launchTemplateVersion` 
**Note**  
When an update or upgrade is initiated without changes to the scaling configuration, the workflow uses the live Auto Scaling group values as the starting point, not the node group’s stored scaling configuration. For more information, see [Managed node groups concepts](managed-node-groups.md#managed-node-group-concepts).

1. It marks nodes as unschedulable to avoid scheduling new Pods. It also labels nodes with `node.kubernetes.io/exclude-from-external-load-balancers=true` to remove the old nodes from load balancers before terminating the nodes.

The following are known reasons which lead to a `NodeCreationFailure` error in this phase:

 **Insufficient capacity in the Availability Zone**   
There is a possibility that the Availability Zone might not have capacity of requested instance types. It’s recommended to configure multiple instance types while creating a managed node group.

 **EC2 instance limits in your account**   
You may need to increase the number of Amazon EC2 instances your account can run simultaneously using Service Quotas. For more information, see [EC2 Service Quotas](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-resource-limits.html) in the *Amazon Elastic Compute Cloud User Guide for Linux Instances*.

 **Custom user data**   
Custom user data can sometimes break the bootstrap process. This scenario can lead to the `kubelet` not starting on the node or nodes not getting expected Amazon EKS labels on them. For more information, see [Specifying an AMI](launch-templates.md#launch-template-custom-ami).

 **Any changes which make a node unhealthy or not ready**   
Node disk pressure, memory pressure, and similar conditions can lead to a node not going to `Ready` state.

 **Each node most bootstrap within 15 minutes**   
If any node takes more than 15 minutes to bootstrap and join the cluster, it will cause the upgrade to time out. This is the total runtime for bootstrapping a new node measured from when a new node is required to when it joins the cluster. When upgrading a managed node group, the time counter starts as soon as the Auto Scaling Group size increases.

## Upgrade phase
<a name="managed-node-update-upgrade"></a>

The upgrade phase behaves in two different ways, depending on the *update strategy*. There are two update strategies: **default** and **minimal**.

We recommend the default strategy in most scenarios. It creates new nodes before terminating the old ones, so that the available capacity is maintained during the upgrade phase. The minimal strategy is useful in scenarios where you are constrained to resources or costs, for example with hardware accelerators such as GPUs. It terminating the old nodes before creating the new ones, so that total capacity never increases beyond your configured quantity.

The *default* update strategy has these steps:

1. It increases the quantity of nodes (desired count) in the Auto Scaling Group, causing the node group to create additional nodes.

1. It randomly selects a node that needs to be upgraded, up to the maximum unavailable configured for the node group.

1. It drains the Pods from the node. If the Pods don’t leave the node within 15 minutes and there’s no force flag, the upgrade phase fails with a `PodEvictionFailure` error. For this scenario, you can apply the force flag with the `update-nodegroup-version` request to delete the Pods.

1. It cordons the node after every Pod is evicted and waits for 60 seconds. This is done so that the service controller doesn’t send any new requests to this node and removes this node from its list of active nodes.

1. It sends a termination request to the Auto Scaling Group for the cordoned node.

1. It repeats the previous upgrade steps until there are no nodes in the node group that are deployed with the earlier version of the launch template.

The *minimal* update strategy has these steps:

1. It cordons all nodes of the node group in the beginning, so that the service controller doesn’t send any new requests to these nodes.

1. It randomly selects a node that needs to be upgraded, up to the maximum unavailable configured for the node group.

1. It drains the Pods from the selected nodes. If the Pods don’t leave the node within 15 minutes and there’s no force flag, the upgrade phase fails with a `PodEvictionFailure` error. For this scenario, you can apply the force flag with the `update-nodegroup-version` request to delete the Pods.

1. After every Pod is evicted and waits for 60 seconds, it sends a termination request to the Auto Scaling Group for the selected nodes. The Auto Scaling Group creates new nodes (same as the number of selected nodes) to replace the missing capacity.

1. It repeats the previous upgrade steps until there are no nodes in the node group that are deployed with the earlier version of the launch template.

### `PodEvictionFailure` errors during the upgrade phase
<a name="_podevictionfailure_errors_during_the_upgrade_phase"></a>

The following are known reasons which lead to a `PodEvictionFailure` error in this phase:

 **Aggressive PDB**   
Aggressive PDB is defined on the Pod or there are multiple PDBs pointing to the same Pod.

 **Deployment tolerating all the taints**   
Once every Pod is evicted, it’s expected for the node to be empty because the node is [tainted](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) in the earlier steps. However, if the deployment tolerates every taint, then the node is more likely to be non-empty, leading to Pod eviction failure.

## Scale down phase
<a name="managed-node-update-scale-down"></a>

The scale down phase decrements the Auto Scaling group maximum size and desired size by one to return to values before the update started.

If the Upgrade workflow determines that the Cluster Autoscaler is scaling up the node group during the scale down phase of the workflow, it exits immediately without bringing the node group back to its original size.

**Note**  

```
If your node group has a warm pool enabled, warm pool instances are drained before the scale-up operation begins. This is because warm pool instances have not been updated to the new launch template configuration — during the scale-up phase, they would be pulled into the Auto Scaling Group instead of launching new instances with the updated configuration, which would break the upgrade process. Draining the warm pool ensures that only new instances with the updated configuration are launched. Once the scale-down operation completes, the warm pool is restored, and the new instances in the warm pool are launched with the updated launch template configuration.
```
For more information about warm pools, see [Decrease latency for applications with long boot times using warm pools with managed node groups](warm-pools-managed-node-groups.md).

# Customize managed nodes with launch templates
<a name="launch-templates"></a>

For the highest level of customization, you can deploy managed nodes with your own launch template based on the steps on this page. Using a launch template allows capabilities such as to provide bootstrap arguments during deployment of a node (e.g., extra [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) arguments), assign IP addresses to Pods from a different CIDR block than the IP address assigned to the node, deploy your own custom AMI to nodes, or deploy your own custom CNI to nodes.

When you give your own launch template upon first creating a managed node group, you will also have greater flexibility later. As long as you deploy a managed node group with your own launch template, you can iteratively update it with a different version of the same launch template. When you update your node group to a different version of your launch template, all nodes in the group are recycled to match the new configuration of the specified launch template version.

Managed node groups are always deployed with a launch template to be used with the Amazon EC2 Auto Scaling group. When you don’t provide a launch template, the Amazon EKS API creates one automatically with default values in your account. However, we don’t recommend that you modify auto-generated launch templates. Furthermore, existing node groups that don’t use a custom launch template can’t be updated directly. Instead, you must create a new node group with a custom launch template to do so.

## Launch template configuration basics
<a name="launch-template-basics"></a>

You can create an Amazon EC2 Auto Scaling launch template with the AWS Management Console, AWS CLI, or an AWS SDK. For more information, see [Creating a Launch Template for an Auto Scaling group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/create-launch-template.html) in the *Amazon EC2 Auto Scaling User Guide*. Some of the settings in a launch template are similar to the settings used for managed node configuration. When deploying or updating a node group with a launch template, some settings must be specified in either the node group configuration or the launch template. Don’t specify a setting in both places. If a setting exists where it shouldn’t, then operations such as creating or updating a node group fail.

The following table lists the settings that are prohibited in a launch template. It also lists similar settings, if any are available, that are required in the managed node group configuration. The listed settings are the settings that appear in the console. They might have similar but different names in the AWS CLI and SDK.


| Launch template – Prohibited | Amazon EKS node group configuration | 
| --- | --- | 
|   **Subnet** under **Network interfaces** (**Add network interface**)  |   **Subnets** under **Node group network configuration** on the **Specify networking** page  | 
|   **IAM instance profile** under **Advanced details**   |   **Node IAM role** under **Node group configuration** on the **Configure Node group** page  | 
|   **Shutdown behavior** and **Stop - Hibernate behavior** under **Advanced details**. Retain default **Don’t include in launch template setting** in launch template for both settings.  |  No equivalent. Amazon EKS must control the instance lifecycle, not the Auto Scaling group.  | 

The following table lists the prohibited settings in a managed node group configuration. It also lists similar settings, if any are available, which are required in a launch template. The listed settings are the settings that appear in the console. They might have similar names in the AWS CLI and SDK.


| Amazon EKS node group configuration – Prohibited | Launch template | 
| --- | --- | 
|  (Only if you specified a custom AMI in a launch template) **AMI type** under **Node group compute configuration** on **Set compute and scaling configuration** page – Console displays **Specified in launch template** and the AMI ID that was specified. If **Application and OS Images (Amazon Machine Image)** wasn’t specified in the launch template, you can select an AMI in the node group configuration.  |   **Application and OS Images (Amazon Machine Image)** under **Launch template contents** – You must specify an ID if you have either of the following requirements: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html)  | 
|   **Disk size** under **Node group compute configuration** on **Set compute and scaling configuration** page – Console displays **Specified in launch template**.  |   **Size** under **Storage (Volumes)** (**Add new volume**). You must specify this in the launch template.  | 
|   **SSH key pair** under **Node group configuration** on the **Specify Networking** page – The console displays the key that was specified in the launch template or displays **Not specified in launch template**.  |   **Key pair name** under **Key pair (login)**.  | 
|  You can’t specify source security groups that are allowed remote access when using a launch template.  |   **Security groups** under **Network settings** for the instance or **Security groups** under **Network interfaces** (**Add network interface**), but not both. For more information, see [Using custom security groups](#launch-template-security-groups).  | 

**Note**  
If you deploy a node group using a launch template, specify zero or one **Instance type** under **Launch template contents** in a launch template. Alternatively, you can specify 0–20 instance types for **Instance types** on the **Set compute and scaling configuration** page in the console. Or, you can do so using other tools that use the Amazon EKS API. If you specify an instance type in a launch template, and use that launch template to deploy your node group, then you can’t specify any instance types in the console or using other tools that use the Amazon EKS API. If you don’t specify an instance type in a launch template, in the console, or using other tools that use the Amazon EKS API, the `t3.medium` instance type is used. If your node group is using the Spot capacity type, then we recommend specifying multiple instance types using the console. For more information, see [Managed node group capacity types](managed-node-groups.md#managed-node-group-capacity-types).
If any containers that you deploy to the node group use the Instance Metadata Service Version 2, make sure to set the **Metadata response hop limit** to `2` in your launch template. For more information, see [Instance metadata and user data](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html) in the *Amazon EC2 User Guide*.
Launch templates do not support the `InstanceRequirements` feature that allows flexible instance type selection.

## Tagging Amazon EC2 instances
<a name="launch-template-tagging"></a>

You can use the `TagSpecification` parameter of a launch template to specify which tags to apply to Amazon EC2 instances in your node group. The IAM entity calling the `CreateNodegroup` or `UpdateNodegroupVersion` APIs must have permissions for `ec2:RunInstances` and `ec2:CreateTags`, and the tags must be added to the launch template.

## Using custom security groups
<a name="launch-template-security-groups"></a>

You can use a launch template to specify custom Amazon EC2 [security groups](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-security-groups.html) to apply to instances in your node group. This can be either in the instance level security groups parameter or as part of the network interface configuration parameters. However, you can’t create a launch template that specifies both instance level and network interface security groups. Consider the following conditions that apply to using custom security groups with managed node groups:
+ When using the AWS Management Console, Amazon EKS only allows launch templates with a single network interface specification.
+ By default, Amazon EKS applies the [cluster security group](sec-group-reqs.md) to the instances in your node group to facilitate communication between nodes and the control plane. If you specify custom security groups in the launch template using either option mentioned earlier, Amazon EKS doesn’t add the cluster security group. So, you must ensure that the inbound and outbound rules of your security groups enable communication with the endpoint of your cluster. If your security group rules are incorrect, the worker nodes can’t join the cluster. For more information about security group rules, see [View Amazon EKS security group requirements for clusters](sec-group-reqs.md).
+ If you need SSH access to the instances in your node group, include a security group that allows that access.

## Amazon EC2 user data
<a name="launch-template-user-data"></a>

The launch template includes a section for custom user data. You can specify configuration settings for your node group in this section without manually creating individual custom AMIs. For more information about the settings available for Bottlerocket, see [Using user data](https://github.com/bottlerocket-os/bottlerocket#using-user-data) on GitHub.

You can supply Amazon EC2 user data in your launch template using `cloud-init` when launching your instances. For more information, see the [cloud-init](https://cloudinit.readthedocs.io/en/latest/index.html) documentation. Your user data can be used to perform common configuration operations. This includes the following operations:
+  [Including users or groups](https://cloudinit.readthedocs.io/en/latest/topics/examples.html#including-users-and-groups) 
+  [Installing packages](https://cloudinit.readthedocs.io/en/latest/topics/examples.html#install-arbitrary-packages) 

Amazon EC2 user data in launch templates that are used with managed node groups must be in the [MIME multi-part archive](https://cloudinit.readthedocs.io/en/latest/topics/format.html#mime-multi-part-archive) format for Amazon Linux AMIs and TOML format for Bottlerocket AMIs. This is because your user data is merged with Amazon EKS user data required for nodes to join the cluster. Don’t specify any commands in your user data that starts or modifies `kubelet`. This is performed as part of the user data merged by Amazon EKS. Certain `kubelet` parameters, such as setting labels on nodes, can be configured directly through the managed node groups API.

**Note**  
For more information about advanced `kubelet` customization, including manually starting it or passing in custom configuration parameters, see [Specifying an AMI](#launch-template-custom-ami). If a custom AMI ID is specified in a launch template, Amazon EKS doesn’t merge user data.

The following details provide more information about the user data section.

 **Amazon Linux 2 user data**   
You can combine multiple user data blocks together into a single MIME multi-part file. For example, you can combine a cloud boothook that configures the Docker daemon with a user data shell script that installs a custom package. A MIME multi-part file consists of the following components:  
+ The content type and part boundary declaration – `Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="` 
+ The MIME version declaration – `MIME-Version: 1.0` 
+ One or more user data blocks, which contain the following components:
  + The opening boundary, which signals the beginning of a user data block – `--==MYBOUNDARY==` 
  + The content type declaration for the block: `Content-Type: text/cloud-config; charset="us-ascii"`. For more information about content types, see the [cloud-init](https://cloudinit.readthedocs.io/en/latest/topics/format.html) documentation.
  + The content of the user data (for example, a list of shell commands or `cloud-init` directives).
  + The closing boundary, which signals the end of the MIME multi-part file: `--==MYBOUNDARY==--` 

  The following is an example of a MIME multi-part file that you can use to create your own.

```
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
echo "Running custom user data script"

--==MYBOUNDARY==--
```

 **Amazon Linux 2023 user data**   
Amazon Linux 2023 (AL2023) introduces a new node initialization process `nodeadm` that uses a YAML configuration schema. If you’re using self-managed node groups or an AMI with a launch template, you’ll now need to provide additional cluster metadata explicitly when creating a new node group. An [example](https://awslabs.github.io/amazon-eks-ami/nodeadm/) of the minimum required parameters is as follows, where `apiServerEndpoint`, `certificateAuthority`, and service `cidr` are now required:  

```
---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: my-cluster
    apiServerEndpoint: https://example.com
    certificateAuthority: Y2VydGlmaWNhdGVBdXRob3JpdHk=
    cidr: 10.100.0.0/16
```
You’ll typically set this configuration in your user data, either as-is or embedded within a MIME multi-part document:  

```
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"

--BOUNDARY
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig spec: [...]

--BOUNDARY--
```
In AL2, the metadata from these parameters was discovered from the Amazon EKS `DescribeCluster` API call. With AL2023, this behavior has changed since the additional API call risks throttling during large node scale ups. This change doesn’t affect you if you’re using managed node groups without a launch template or if you’re using Karpenter. For more information on `certificateAuthority` and service `cidr`, see [https://docs.aws.amazon.com/eks/latest/APIReference/API_DescribeCluster.html](https://docs.aws.amazon.com/eks/latest/APIReference/API_DescribeCluster.html) in the *Amazon EKS API Reference*.  
Here’s a complete example of AL2023 user data that combines a shell script for customizing the node (like installing packages or pre-caching container images) with the required `nodeadm` configuration. This example shows common customizations including: \$1 Installing additional system packages \$1 Pre-caching container images to improve Pod startup time \$1 Setting up HTTP proxy configuration \$1 Configuring `kubelet` flags for node labeling  

```
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"

--BOUNDARY
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
set -o errexit
set -o pipefail
set -o nounset

# Install additional packages
yum install -y htop jq iptables-services

# Pre-cache commonly used container images
nohup docker pull public.ecr.aws/eks-distro/kubernetes/pause:3.2 &

# Configure HTTP proxy if needed
cat > /etc/profile.d/http-proxy.sh << 'EOF'
export HTTP_PROXY="http://proxy.example.com:3128"
export HTTPS_PROXY="http://proxy.example.com:3128"
export NO_PROXY="localhost,127.0.0.1,169.254.169.254,.internal"
EOF

--BOUNDARY
Content-Type: application/node.eks.aws

apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: my-cluster
    apiServerEndpoint: https://example.com
    certificateAuthority: Y2VydGlmaWNhdGVBdXRob3JpdHk=
    cidr: 10.100.0.0/16
  kubelet:
    config:
      clusterDNS:
      - 10.100.0.10
    flags:
    - --node-labels=app=my-app,environment=production

--BOUNDARY--
```

 **Bottlerocket user data**   
Bottlerocket structures user data in the TOML format. You can provide user data to be merged with the user data provided by Amazon EKS. For example, you can provide additional `kubelet` settings.  

```
[settings.kubernetes.system-reserved]
cpu = "10m"
memory = "100Mi"
ephemeral-storage= "1Gi"
```
For more information about the supported settings, see [Bottlerocket documentation](https://github.com/bottlerocket-os/bottlerocket). You can configure node labels and [taints](node-taints-managed-node-groups.md) in your user data. However, we recommend that you configure these within your node group instead. Amazon EKS applies these configurations when you do so.  
When user data is merged, formatting isn’t preserved, but the content remains the same. The configuration that you provide in your user data overrides any settings that are configured by Amazon EKS. So, if you set `settings.kubernetes.max-pods` or `settings.kubernetes.cluster-dns-ip`, these values in your user data are applied to the nodes.  
Amazon EKS doesn’t support all valid TOML. The following is a list of known unsupported formats:  
+ Quotes within quoted keys: `'quoted "value"' = "value"` 
+ Escaped quotes in values: `str = "I’m a string. \"You can quote me\""` 
+ Mixed floats and integers: `numbers = [ 0.1, 0.2, 0.5, 1, 2, 5 ]` 
+ Mixed types in arrays: `contributors = ["[foo@example.com](mailto:foo@example.com)", { name = "Baz", email = "[baz@example.com](mailto:baz@example.com)" }]` 
+ Bracketed headers with quoted keys: `[foo."bar.baz"]` 

 **Windows user data**   
Windows user data uses PowerShell commands. When creating a managed node group, your custom user data combines with Amazon EKS managed user data. Your PowerShell commands come first, followed by the managed user data commands, all within one `<powershell></powershell>` tag.  
When creating Windows node groups, Amazon EKS updates the `aws-auth` `ConfigMap` to allow Linux-based nodes to join the cluster. The service doesn’t automatically configure permissions for Windows AMIs. If you’re using Windows nodes, you’ll need to manage access either via the access entry API or by updating the `aws-auth` `ConfigMap` directly. For more information, see [Deploy Windows nodes on EKS clusters](windows-support.md).
When no AMI ID is specified in the launch template, don’t use the Windows Amazon EKS Bootstrap script in user data to configure Amazon EKS.
Example user data is as follows.  

```
<powershell>
Write-Host "Running custom user data script"
</powershell>
```

## Specifying an AMI
<a name="launch-template-custom-ami"></a>

If you have either of the following requirements, then specify an AMI ID in the `ImageId` field of your launch template. Select the requirement you have for additional information.

### Provide user data to pass arguments to the `bootstrap.sh` file included with an Amazon EKS optimized Linux/Bottlerocket AMI
<a name="mng-specify-eks-ami"></a>

Bootstrapping is a term used to describe adding commands that can be run when an instance starts. For example, bootstrapping allows using extra [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) arguments. You can pass arguments to the `bootstrap.sh` script by using `eksctl` without specifying a launch template. Or you can do so by specifying the information in the user data section of a launch template.

 **eksctl without specifying a launch template**   
Create a file named *my-nodegroup.yaml* with the following contents. Replace every *example value* with your own values. The `--apiserver-endpoint`, `--b64-cluster-ca`, and `--dns-cluster-ip` arguments are optional. However, defining them allows the `bootstrap.sh` script to avoid making a `describeCluster` call. This is useful in private cluster setups or clusters where you’re scaling in and out nodes frequently. For more information on the `bootstrap.sh` script, see the [bootstrap.sh](https://github.com/awslabs/amazon-eks-ami/blob/main/templates/al2/runtime/bootstrap.sh) file on GitHub.  
+ The only required argument is the cluster name (*my-cluster*).
+ To retrieve an optimized AMI ID for `ami-1234567890abcdef0 `, see the following sections:
  +  [Retrieve recommended Amazon Linux AMI IDs](retrieve-ami-id.md) 
  +  [Retrieve recommended Bottlerocket AMI IDs](retrieve-ami-id-bottlerocket.md) 
  +  [Retrieve recommended Microsoft Windows AMI IDs](retrieve-windows-ami-id.md) 
+ To retrieve the *certificate-authority* for your cluster, run the following command.

  ```
  aws eks describe-cluster --query "cluster.certificateAuthority.data" --output text --name my-cluster --region region-code
  ```
+ To retrieve the *api-server-endpoint* for your cluster, run the following command.

  ```
  aws eks describe-cluster --query "cluster.endpoint" --output text --name my-cluster --region region-code
  ```
+ The value for `--dns-cluster-ip` is your service CIDR with `.10` at the end. To retrieve the *service-cidr* for your cluster, run the following command. For example, if the returned value for is `ipv4 10.100.0.0/16`, then your value is *10.100.0.10*.

  ```
  aws eks describe-cluster --query "cluster.kubernetesNetworkConfig.serviceIpv4Cidr" --output text --name my-cluster --region region-code
  ```
+ This example provides a `kubelet` argument to set a custom `max-pods` value using the `bootstrap.sh` script included with the Amazon EKS optimized AMI. The node group name can’t be longer than 63 characters. It must start with letter or digit, but can also include hyphens and underscores for the remaining characters. For help with selecting *my-max-pods-value*, see [How maxPods is determined](choosing-instance-type.md#max-pods-precedence). For more information about how `maxPods` is determined when using managed node groups, see [How maxPods is determined](choosing-instance-type.md#max-pods-precedence).

  ```
  ---
  apiVersion: eksctl.io/v1alpha5
  kind: ClusterConfig
  
  metadata:
    name: my-cluster
    region: region-code
  
  managedNodeGroups:
    - name: my-nodegroup
      ami: ami-1234567890abcdef0
      instanceType: m5.large
      privateNetworking: true
      disableIMDSv1: true
      labels: { x86-al2-specified-mng }
      overrideBootstrapCommand: |
        #!/bin/bash
        /etc/eks/bootstrap.sh my-cluster \
          --b64-cluster-ca certificate-authority \
          --apiserver-endpoint api-server-endpoint \
          --dns-cluster-ip service-cidr.10 \
          --kubelet-extra-args '--max-pods=my-max-pods-value' \
          --use-max-pods false
  ```

  For every available `eksctl` `config` file option, see [Config file schema](https://eksctl.io/usage/schema/) in the `eksctl` documentation. The `eksctl` utility still creates a launch template for you and populates its user data with the data that you provide in the `config` file.

  Create a node group with the following command.

  ```
  eksctl create nodegroup --config-file=my-nodegroup.yaml
  ```

 **User data in a launch template**   
Specify the following information in the user data section of your launch template. Replace every *example value* with your own values. The `--apiserver-endpoint`, `--b64-cluster-ca`, and `--dns-cluster-ip` arguments are optional. However, defining them allows the `bootstrap.sh` script to avoid making a `describeCluster` call. This is useful in private cluster setups or clusters where you’re scaling in and out nodes frequently. For more information on the `bootstrap.sh` script, see the [bootstrap.sh](https://github.com/awslabs/amazon-eks-ami/blob/main/templates/al2/runtime/bootstrap.sh) file on GitHub.  
+ The only required argument is the cluster name (*my-cluster*).
+ To retrieve the *certificate-authority* for your cluster, run the following command.

  ```
  aws eks describe-cluster --query "cluster.certificateAuthority.data" --output text --name my-cluster --region region-code
  ```
+ To retrieve the *api-server-endpoint* for your cluster, run the following command.

  ```
  aws eks describe-cluster --query "cluster.endpoint" --output text --name my-cluster --region region-code
  ```
+ The value for `--dns-cluster-ip` is your service CIDR with `.10` at the end. To retrieve the *service-cidr* for your cluster, run the following command. For example, if the returned value for is `ipv4 10.100.0.0/16`, then your value is *10.100.0.10*.

  ```
  aws eks describe-cluster --query "cluster.kubernetesNetworkConfig.serviceIpv4Cidr" --output text --name my-cluster --region region-code
  ```
+ This example provides a `kubelet` argument to set a custom `max-pods` value using the `bootstrap.sh` script included with the Amazon EKS optimized AMI. For help with selecting *my-max-pods-value*, see [How maxPods is determined](choosing-instance-type.md#max-pods-precedence). For more information about how `maxPods` is determined when using managed node groups, see [How maxPods is determined](choosing-instance-type.md#max-pods-precedence).

  ```
  MIME-Version: 1.0
  Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
  
  --==MYBOUNDARY==
  Content-Type: text/x-shellscript; charset="us-ascii"
  
  #!/bin/bash
  set -ex
  /etc/eks/bootstrap.sh my-cluster \
    --b64-cluster-ca certificate-authority \
    --apiserver-endpoint api-server-endpoint \
    --dns-cluster-ip service-cidr.10 \
    --kubelet-extra-args '--max-pods=my-max-pods-value' \
    --use-max-pods false
  
  --==MYBOUNDARY==--
  ```

### Provide user data to pass arguments to the `Start-EKSBootstrap.ps1` file included with an Amazon EKS optimized Windows AMI
<a name="mng-specify-eks-ami-windows"></a>

Bootstrapping is a term used to describe adding commands that can be run when an instance starts. You can pass arguments to the `Start-EKSBootstrap.ps1` script by using `eksctl` without specifying a launch template. Or you can do so by specifying the information in the user data section of a launch template.

If you want to specify a custom Windows AMI ID, keep in mind the following considerations:
+ You must use a launch template and give the required bootstrap commands in the user data section. To retrieve your desired Windows ID, you can use the table in [Create nodes with optimized Windows AMIs](eks-optimized-windows-ami.md).
+ There are several limits and conditions. For example, you must add `eks:kube-proxy-windows` to your AWS IAM Authenticator configuration map. For more information, see [Limits and conditions when specifying an AMI ID](#mng-ami-id-conditions).

Specify the following information in the user data section of your launch template. Replace every *example value* with your own values. The `-APIServerEndpoint`, `-Base64ClusterCA`, and `-DNSClusterIP` arguments are optional. However, defining them allows the `Start-EKSBootstrap.ps1` script to avoid making a `describeCluster` call.
+ The only required argument is the cluster name (*my-cluster*).
+ To retrieve the *certificate-authority* for your cluster, run the following command.

  ```
  aws eks describe-cluster --query "cluster.certificateAuthority.data" --output text --name my-cluster --region region-code
  ```
+ To retrieve the *api-server-endpoint* for your cluster, run the following command.

  ```
  aws eks describe-cluster --query "cluster.endpoint" --output text --name my-cluster --region region-code
  ```
+ The value for `--dns-cluster-ip` is your service CIDR with `.10` at the end. To retrieve the *service-cidr* for your cluster, run the following command. For example, if the returned value for is `ipv4 10.100.0.0/16`, then your value is *10.100.0.10*.

  ```
  aws eks describe-cluster --query "cluster.kubernetesNetworkConfig.serviceIpv4Cidr" --output text --name my-cluster --region region-code
  ```
+ For additional arguments, see [Bootstrap script configuration parameters](eks-optimized-windows-ami.md#bootstrap-script-configuration-parameters).
**Note**  
If you’re using custom service CIDR, then you need to specify it using the `-ServiceCIDR` parameter. Otherwise, the DNS resolution for Pods in the cluster will fail.

```
<powershell>
[string]$EKSBootstrapScriptFile = "$env:ProgramFiles\Amazon\EKS\Start-EKSBootstrap.ps1"
& $EKSBootstrapScriptFile -EKSClusterName my-cluster `
	 -Base64ClusterCA certificate-authority `
	 -APIServerEndpoint api-server-endpoint `
	 -DNSClusterIP service-cidr.10
</powershell>
```

### Run a custom AMI due to specific security, compliance, or internal policy requirements
<a name="mng-specify-custom-ami"></a>

For more information, see [Amazon Machine Images (AMI)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) in the *Amazon EC2 User Guide*. The Amazon EKS AMI build specification contains resources and configuration scripts for building a custom Amazon EKS AMI based on Amazon Linux. For more information, see [Amazon EKS AMI Build Specification](https://github.com/awslabs/amazon-eks-ami/) on GitHub. To build custom AMIs installed with other operating systems, see [Amazon EKS Sample Custom AMIs](https://github.com/aws-samples/amazon-eks-custom-amis) on GitHub.

You cannot use dynamic parameter references for AMI IDs in Launch Templates used with managed node groups.

**Important**  
When specifying an AMI, Amazon EKS does not validate the Kubernetes version embedded in your AMI against your cluster’s control plane version. You are responsible for ensuring that the Kubernetes version of your custom AMI conforms to the [Kubernetes version skew policy](https://kubernetes.io/releases/version-skew-policy):  
The `kubelet` version on your nodes must not be newer than your cluster version
The `kubelet` version on your nodes must be equal to or up to 3 minor versions behind your cluster version (for Kubernetes version `1.28` or higher), or up to 2 minor versions behind your cluster version (for Kubernetes version `1.27` or lower)  
Creating managed node groups with version skew violations may result in:
Nodes failing to join the cluster
Undefined behavior or API incompatibilities
Cluster instability or workload failures
When specifying an AMI, Amazon EKS doesn’t merge any user data. Rather, you’re responsible for supplying the required `bootstrap` commands for nodes to join the cluster. If your nodes fail to join the cluster, the Amazon EKS `CreateNodegroup` and `UpdateNodegroupVersion` actions also fail.

## Limits and conditions when specifying an AMI ID
<a name="mng-ami-id-conditions"></a>

The following are the limits and conditions involved with specifying an AMI ID with managed node groups:
+ You must create a new node group to switch between specifying an AMI ID in a launch template and not specifying an AMI ID.
+ You aren’t notified in the console when a newer AMI version is available. To update your node group to a newer AMI version, you need to create a new version of your launch template with an updated AMI ID. Then, you need to update the node group with the new launch template version.
+ The following fields can’t be set in the API if you specify an AMI ID:
  +  `amiType` 
  +  `releaseVersion` 
  +  `version` 
+ Any `taints` set in the API are applied asynchronously if you specify an AMI ID. To apply taints prior to a node joining the cluster, you must pass the taints to `kubelet` in your user data using the `--register-with-taints` command line flag. For more information, see [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) in the Kubernetes documentation.
+ When specifying a custom AMI ID for Windows managed node groups, add `eks:kube-proxy-windows` to your AWS IAM Authenticator configuration map. This is required for DNS to function properly.

  1. Open the AWS IAM Authenticator configuration map for editing.

     ```
     kubectl edit -n kube-system cm aws-auth
     ```

  1. Add this entry to the `groups` list under each `rolearn` associated with Windows nodes. Your configuration map should look similar to [aws-auth-cm-windows.yaml](https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/aws-auth-cm-windows.yaml).

     ```
     - eks:kube-proxy-windows
     ```

  1. Save the file and exit your text editor.
+ For any AMI that uses a custom launch template, the default `HttpPutResponseHopLimit` for managed node groups is set to `2`.

# Delete a managed node group from your cluster
<a name="delete-managed-node-group"></a>

This topic describes how you can delete an Amazon EKS managed node group. When you delete a managed node group, Amazon EKS first sets the minimum, maximum, and desired size of your Auto Scaling group to zero. This then causes your node group to scale down.

Before each instance is terminated, Amazon EKS sends a signal to drain that node. During the drain process, Kubernetes does the following for each pod on the node: runs any configured `preStop` lifecycle hooks, sends `SIGTERM` signals to the containers, then waits for the `terminationGracePeriodSeconds` for graceful shutdown. If the node hasn’t been drained after 5 minutes, Amazon EKS lets Auto Scaling continue the forced termination of the instance. After all instances have been terminated, the Auto Scaling group is deleted.

**Important**  
If you delete a managed node group that uses a node IAM role that isn’t used by any other managed node group in the cluster, the role is removed from the `aws-auth` `ConfigMap`. If any of the self-managed node groups in the cluster are using the same node IAM role, the self-managed nodes move to the `NotReady` status. Additionally, the cluster operation is also disrupted. To add a mapping for the role you’re using only for the self-managed node groups, see [Create access entries](creating-access-entries.md), if your cluster’s platform version is at least minimum version listed in the prerequisites section of [Grant IAM users access to Kubernetes with EKS access entries](access-entries.md). If your platform version is earlier than the required minimum version for access entries, you can add the entry back to the `aws-auth` `ConfigMap`. For more information, enter `eksctl create iamidentitymapping --help` in your terminal.

You can delete a managed node group with:
+  [`eksctl`](#eksctl-delete-managed-nodegroup) 
+  [AWS Management Console](#console-delete-managed-nodegroup) 
+  [AWS CLI](#awscli-delete-managed-nodegroup) 

## `eksctl`
<a name="eksctl-delete-managed-nodegroup"></a>

 **Delete a managed node group with `eksctl` ** 

Enter the following command. Replace every `<example value>` with your own values.

```
eksctl delete nodegroup \
  --cluster <my-cluster> \
  --name <my-mng> \
  --region <region-code>
```

For more options, see [Deleting and draining nodegroups](https://eksctl.io/usage/nodegroups/#deleting-and-draining-nodegroups) in the `eksctl` documentation.

## AWS Management Console
<a name="console-delete-managed-nodegroup"></a>

 **Delete a managed node group with AWS Management Console ** 

1. Open the [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. On the **Clusters** page, choose the cluster that contains the node group to delete.

1. On the selected cluster page, choose the **Compute** tab.

1. In the **Node groups** section, choose the node group to delete. Then choose **Delete**.

1. In the **Delete node group** confirmation dialog box, enter the name of the node group. Then choose **Delete**.

## AWS CLI
<a name="awscli-delete-managed-nodegroup"></a>

 **Delete a managed node group with AWS CLI** 

1. Enter the following command. Replace every `<example value>` with your own values.

   ```
   aws eks delete-nodegroup \
     --cluster-name <my-cluster> \
     --nodegroup-name <my-mng> \
     --region <region-code>
   ```

1. If `cli_pager=` is set in the CLI config, use the arrow keys on your keyboard to scroll through the response output. Press the `q` key when you’re finished.

   For more options, see the ` [delete-nodegroup](https://docs.aws.amazon.com/cli/latest/reference/eks/delete-nodegroup.html) ` command in the * AWS CLI Command Reference*.

# Maintain nodes yourself with self-managed nodes
<a name="worker"></a>

A cluster contains one or more Amazon EC2 nodes that Pods are scheduled on. Amazon EKS nodes run in your AWS account and connect to the control plane of your cluster through the cluster API server endpoint. You’re billed for them based on Amazon EC2 prices. For more information, see [Amazon EC2 pricing](https://aws.amazon.com/ec2/pricing/).

A cluster can contain several node groups. Each node group contains one or more nodes that are deployed in an [Amazon EC2 Auto Scaling group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html). The instance type of the nodes within the group can vary, such as when using [attribute-based instance type selection](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-fleet-attribute-based-instance-type-selection.html) with [Karpenter](https://karpenter.sh/). All instances in a node group must use the [Amazon EKS node IAM role](create-node-role.md).

Amazon EKS provides specialized Amazon Machine Images (AMIs) that are called Amazon EKS optimized AMIs. The AMIs are configured to work with Amazon EKS. Their components include `containerd`, `kubelet`, and the AWS IAM Authenticator. The AMIs also contain a specialized [bootstrap script](https://github.com/awslabs/amazon-eks-ami/blob/main/templates/al2/runtime/bootstrap.sh) that allows it to discover and connect to your cluster’s control plane automatically.

If you restrict access to the public endpoint of your cluster using CIDR blocks, we recommend that you also enable private endpoint access. This is so that nodes can communicate with the cluster. Without the private endpoint enabled, the CIDR blocks that you specify for public access must include the egress sources from your VPC. For more information, see [Cluster API server endpoint](cluster-endpoint.md).

To add self-managed nodes to your Amazon EKS cluster, see the topics that follow. If you launch self-managed nodes manually, add the following tag to each node while making sure that `<cluster-name>` matches your cluster. For more information, see [Adding and deleting tags on an individual resource](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html#adding-or-deleting-tags). If you follow the steps in the guides that follow, the required tag is automatically added to nodes for you.


| Key | Value | 
| --- | --- | 
|   `kubernetes.io/cluster/<cluster-name>`   |   `owned`   | 

**Important**  
Tags in Amazon EC2 Instance Metadata Service (IMDS) are not compatible with EKS nodes. When Instance Metadata Tags are enabled, the use of forward slashes ('/') in tag values is prevented. This limitation can cause instance launch failures, particularly when using node management tools like Karpenter or Cluster Autoscaler, as these services rely on tags containing forward slashes for proper functionality.

For more information about nodes from a general Kubernetes perspective, see [Nodes](https://kubernetes.io/docs/concepts/architecture/nodes/) in the Kubernetes documentation.

**Topics**
+ [

# Create self-managed Amazon Linux nodes
](launch-workers.md)
+ [

# Create self-managed Bottlerocket nodes
](launch-node-bottlerocket.md)
+ [

# Create self-managed Microsoft Windows nodes
](launch-windows-workers.md)
+ [

# Create self-managed Ubuntu Linux nodes
](launch-node-ubuntu.md)
+ [

# Update self-managed nodes for your cluster
](update-workers.md)

# Create self-managed Amazon Linux nodes
<a name="launch-workers"></a>

This topic describes how you can launch Auto Scaling groups of Linux nodes that register with your Amazon EKS cluster. After the nodes join the cluster, you can deploy Kubernetes applications to them. You can also launch self-managed Amazon Linux nodes with `eksctl` or the AWS Management Console. If you need to launch nodes on AWS Outposts, see [Create Amazon Linux nodes on AWS Outposts](eks-outposts-self-managed-nodes.md).
+ An existing Amazon EKS cluster. To deploy one, see [Create an Amazon EKS cluster](create-cluster.md). If you have subnets in the AWS Region where you have AWS Outposts, AWS Wavelength, or AWS Local Zones enabled, those subnets must not have been passed in when you created your cluster.
+ An existing IAM role for the nodes to use. To create one, see [Amazon EKS node IAM role](create-node-role.md). If this role doesn’t have either of the policies for the VPC CNI, the separate role that follows is required for the VPC CNI pods.
+ (Optional, but recommended) The Amazon VPC CNI plugin for Kubernetes add-on configured with its own IAM role that has the necessary IAM policy attached to it. For more information, see [Configure Amazon VPC CNI plugin to use IRSA](cni-iam-role.md).
+ Familiarity with the considerations listed in [Choose an optimal Amazon EC2 node instance type](choosing-instance-type.md). Depending on the instance type you choose, there may be additional prerequisites for your cluster and VPC.

You can launch self-managed Linux nodes using either of the following:
+  [`eksctl`](#eksctl_create_managed_amazon_linux) 
+  [AWS Management Console](#console_create_managed_amazon_linux) 

## `eksctl`
<a name="eksctl_create_managed_amazon_linux"></a>

 **Launch self-managed Linux nodes using `eksctl` ** 

1. Install version `0.215.0` or later of the `eksctl` command line tool installed on your device or AWS CloudShell. To install or update `eksctl`, see [Installation](https://eksctl.io/installation) in the `eksctl` documentation.

1. (Optional) If the **AmazonEKS\$1CNI\$1Policy** managed IAM policy is attached to your [Amazon EKS node IAM role](create-node-role.md), we recommend assigning it to an IAM role that you associate to the Kubernetes `aws-node` service account instead. For more information, see [Configure Amazon VPC CNI plugin to use IRSA](cni-iam-role.md).

1. The following command creates a node group in an existing cluster. Replace *al-nodes* with a name for your node group. The node group name can’t be longer than 63 characters. It must start with letter or digit, but can also include hyphens and underscores for the remaining characters. Replace *my-cluster* with the name of your cluster. The name can contain only alphanumeric characters (case-sensitive) and hyphens. It must start with an alphanumeric character and can’t be longer than 100 characters. The name must be unique within the AWS Region and AWS account that you’re creating the cluster in. Replace the remaining *example value* with your own values. The nodes are created with the same Kubernetes version as the control plane, by default.

   Before choosing a value for `--node-type`, review [Choose an optimal Amazon EC2 node instance type](choosing-instance-type.md).

   Replace *my-key* with the name of your Amazon EC2 key pair or public key. This key is used to SSH into your nodes after they launch. If you don’t already have an Amazon EC2 key pair, you can create one in the AWS Management Console. For more information, see [Amazon EC2 key pairs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*.

   Create your node group with the following command.
**Important**  
If you want to deploy a node group to AWS Outposts, Wavelength, or Local Zone subnets, there are additional considerations:  
The subnets must not have been passed in when you created the cluster.
You must create the node group with a config file that specifies the subnets and ` [volumeType](https://eksctl.io/usage/schema/#nodeGroups-volumeType): gp2`. For more information, see [Create a nodegroup from a config file](https://eksctl.io/usage/nodegroups/#creating-a-nodegroup-from-a-config-file) and [Config file schema](https://eksctl.io/usage/schema/) in the `eksctl` documentation.

   ```
   eksctl create nodegroup \
     --cluster my-cluster \
     --name al-nodes \
     --node-type t3.medium \
     --nodes 3 \
     --nodes-min 1 \
     --nodes-max 4 \
     --ssh-access \
     --managed=false \
     --ssh-public-key my-key
   ```

   To deploy a node group that:
   + can assign a significantly higher number of IP addresses to Pods than the default configuration, see [Assign more IP addresses to Amazon EKS nodes with prefixes](cni-increase-ip-addresses.md).
   + can assign `IPv4` addresses to Pods from a different CIDR block than that of the instance, see [Deploy Pods in alternate subnets with custom networking](cni-custom-network.md).
   + can assign `IPv6` addresses to Pods and services, see [Learn about IPv6 addresses to clusters, Pods, and services](cni-ipv6.md).
   + don’t have outbound internet access, see [Deploy private clusters with limited internet access](private-clusters.md).

     For a complete list of all available options and defaults, enter the following command.

     ```
     eksctl create nodegroup --help
     ```

     If nodes fail to join the cluster, then see [Nodes fail to join cluster](troubleshooting.md#worker-node-fail) in the Troubleshooting chapter.

     An example output is as follows. Several lines are output while the nodes are created. One of the last lines of output is the following example line.

     ```
     [✔]  created 1 nodegroup(s) in cluster "my-cluster"
     ```

1. (Optional) Deploy a [sample application](sample-deployment.md) to test your cluster and Linux nodes.

1. We recommend blocking Pod access to IMDS if the following conditions are true:
   + You plan to assign IAM roles to all of your Kubernetes service accounts so that Pods only have the minimum permissions that they need.
   + No Pods in the cluster require access to the Amazon EC2 instance metadata service (IMDS) for other reasons, such as retrieving the current AWS Region.

   For more information, see [Restrict access to the instance profile assigned to the worker node](https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node).

## AWS Management Console
<a name="console_create_managed_amazon_linux"></a>

 **Step 1: Launch self-managed Linux nodes using AWS Management Console ** 

1. Download the latest version of the AWS CloudFormation template.

   ```
   curl -O https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2025-11-26/amazon-eks-nodegroup.yaml
   ```

1. Wait for your cluster status to show as `ACTIVE`. If you launch your nodes before the cluster is active, the nodes fail to register with the cluster and you will have to relaunch them.

1. Open the [AWS CloudFormation console](https://console.aws.amazon.com/cloudformation/).

1. Choose **Create stack** and then select **With new resources (standard)**.

1. For **Specify template**, select **Upload a template file** and then select **Choose file**.

1. Select the `amazon-eks-nodegroup.yaml` file that you downloaded.

1. Select **Next**.

1. On the **Specify stack details** page, enter the following parameters accordingly, and then choose **Next**:
   +  **Stack name**: Choose a stack name for your AWS CloudFormation stack. For example, you can call it *my-cluster-nodes*. The name can contain only alphanumeric characters (case-sensitive) and hyphens. It must start with an alphanumeric character and can’t be longer than 100 characters. The name must be unique within the AWS Region and AWS account that you’re creating the cluster in.
   +  **ClusterName**: Enter the name that you used when you created your Amazon EKS cluster. This name must equal the cluster name or your nodes can’t join the cluster.
   +  **ClusterControlPlaneSecurityGroup**: Choose the **SecurityGroups** value from the AWS CloudFormation output that you generated when you created your [VPC](creating-a-vpc.md).

     The following steps show one operation to retrieve the applicable group.

     1. Open the [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

     1. Choose the name of the cluster.

     1. Choose the **Networking** tab.

     1. Use the **Additional security groups** value as a reference when selecting from the **ClusterControlPlaneSecurityGroup** dropdown list.
   +  **ApiServerEndpoint**: Enter the API Server Endpoint for your EKS Cluster. This can be found in the Details section of the EKS Cluster Console
   +  **CertificateAuthorityData**: Enter the base64 encoded Certificate Authority data which can also be found in the EKS Cluster Console’s Details section.
   +  **ServiceCidr**: Enter the CIDR range used for allocating IP addresses to Kubernetes services within the cluster. This is found within the networking tab of the EKS Cluster Console.
   +  **AuthenticationMode**: Select the Authentication Mode in use in the EKS Cluster by reviewing the access tab within the EKS Cluster Console.
   +  **NodeGroupName**: Enter a name for your node group. This name can be used later to identify the Auto Scaling node group that’s created for your nodes. The node group name can’t be longer than 63 characters. It must start with letter or digit, but can also include hyphens and underscores for the remaining characters.
   +  **NodeAutoScalingGroupMinSize**: Enter the minimum number of nodes that your node Auto Scaling group can scale in to.
   +  **NodeAutoScalingGroupDesiredCapacity**: Enter the desired number of nodes to scale to when your stack is created.
   +  **NodeAutoScalingGroupMaxSize**: Enter the maximum number of nodes that your node Auto Scaling group can scale out to.
   +  **NodeInstanceType**: Choose an instance type for your nodes. For more information, see [Choose an optimal Amazon EC2 node instance type](choosing-instance-type.md).
   +  **NodeImageIdSSMParam**: Pre-populated with the Amazon EC2 Systems Manager parameter of a recent Amazon EKS optimized Amazon Linux 2023 AMI for a variable Kubernetes version. To use a different Kubernetes minor version supported with Amazon EKS, replace *1.XX* with a different [supported version](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html). We recommend specifying the same Kubernetes version as your cluster.

     You can also replace *amazon-linux-2023* with a different AMI type. For more information, see [Retrieve recommended Amazon Linux AMI IDs](retrieve-ami-id.md).
**Note**  
The Amazon EKS node AMIs are based on Amazon Linux. You can track security or privacy events for Amazon Linux 2023 at the [Amazon Linux Security Center](https://alas.aws.amazon.com/alas2023.html) or subscribe to the associated [RSS feed](https://alas.aws.amazon.com/AL2023/alas.rss). Security and privacy events include an overview of the issue, what packages are affected, and how to update your instances to correct the issue.
   +  **NodeImageId**: (Optional) If you’re using your own custom AMI (instead of an Amazon EKS optimized AMI), enter a node AMI ID for your AWS Region. If you specify a value here, it overrides any values in the **NodeImageIdSSMParam** field.
   +  **NodeVolumeSize**: Specify a root volume size for your nodes, in GiB.
   +  **NodeVolumeType**: Specify a root volume type for your nodes.
   +  **KeyName**: Enter the name of an Amazon EC2 SSH key pair that you can use to connect using SSH into your nodes with after they launch. If you don’t already have an Amazon EC2 key pair, you can create one in the AWS Management Console. For more information, see [Amazon EC2 key pairs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*.
   +  **VpcId**: Enter the ID for the [VPC](creating-a-vpc.md) that you created.
   +  **Subnets**: Choose the subnets that you created for your VPC. If you created your VPC using the steps that are described in [Create an Amazon VPC for your Amazon EKS cluster](creating-a-vpc.md), specify only the private subnets within the VPC for your nodes to launch into. You can see which subnets are private by opening each subnet link from the **Networking** tab of your cluster.
**Important**  
If any of the subnets are public subnets, then they must have the automatic public IP address assignment setting enabled. If the setting isn’t enabled for the public subnet, then any nodes that you deploy to that public subnet won’t be assigned a public IP address and won’t be able to communicate with the cluster or other AWS services. If the subnet was deployed before March 26, 2020 using either of the [Amazon EKS AWS CloudFormation VPC templates](creating-a-vpc.md), or by using `eksctl`, then automatic public IP address assignment is disabled for public subnets. For information about how to enable public IP address assignment for a subnet, see [Modifying the public IPv4 addressing attribute for your subnet](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-ip-addressing.html#subnet-public-ip). If the node is deployed to a private subnet, then it’s able to communicate with the cluster and other AWS services through a NAT gateway.
If the subnets don’t have internet access, make sure that you’re aware of the considerations and extra steps in [Deploy private clusters with limited internet access](private-clusters.md).
If you select AWS Outposts, Wavelength, or Local Zone subnets, the subnets must not have been passed in when you created the cluster.

1. Select your desired choices on the **Configure stack options** page, and then choose **Next**.

1. Select the check box to the left of **I acknowledge that AWS CloudFormation might create IAM resources.**, and then choose **Create stack**.

1. When your stack has finished creating, select it in the console and choose **Outputs**. If you are using the `EKS API` or `EKS API and ConfigMap` Authentication Modes, this is the last step.

1. If you are using the `ConfigMap` Authentication Mode, record the **NodeInstanceRole** for the node group that was created.

 **Step 2: Enable nodes to join your cluster** 

**Note**  
The following two steps are only needed if using the Configmap Authentication Mode within the EKS Cluster. Additionally, if you launched nodes inside a private VPC without outbound internet access, make sure to enable nodes to join your cluster from within the VPC.

1. Check to see if you already have an `aws-auth` `ConfigMap`.

   ```
   kubectl describe configmap -n kube-system aws-auth
   ```

1. If you are shown an `aws-auth` `ConfigMap`, then update it as needed.

   1. Open the `ConfigMap` for editing.

      ```
      kubectl edit -n kube-system configmap/aws-auth
      ```

   1. Add a new `mapRoles` entry as needed. Set the `rolearn` value to the **NodeInstanceRole** value that you recorded in the previous procedure.

      ```
      [...]
      data:
        mapRoles: |
          - rolearn: <ARN of instance role (not instance profile)>
            username: system:node:{{EC2PrivateDNSName}}
            groups:
              - system:bootstrappers
              - system:nodes
      [...]
      ```

   1. Save the file and exit your text editor.

1. If you received an error stating "`Error from server (NotFound): configmaps "aws-auth" not found`, then apply the stack `ConfigMap`.

   1. Download the configuration map.

      ```
      curl -O https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/aws-auth-cm.yaml
      ```

   1. In the `aws-auth-cm.yaml` file, set the `rolearn` value to the **NodeInstanceRole** value that you recorded in the previous procedure. You can do this with a text editor, or by replacing *my-node-instance-role* and running the following command:

      ```
      sed -i.bak -e 's|<ARN of instance role (not instance profile)>|my-node-instance-role|' aws-auth-cm.yaml
      ```

   1. Apply the configuration. This command may take a few minutes to finish.

      ```
      kubectl apply -f aws-auth-cm.yaml
      ```

1. Watch the status of your nodes and wait for them to reach the `Ready` status.

   ```
   kubectl get nodes --watch
   ```

   Enter `Ctrl`\$1`C` to return to a shell prompt.
**Note**  
If you receive any authorization or resource type errors, see [Unauthorized or access denied (`kubectl`)](troubleshooting.md#unauthorized) in the troubleshooting topic.

   If nodes fail to join the cluster, then see [Nodes fail to join cluster](troubleshooting.md#worker-node-fail) in the Troubleshooting chapter.

1. (GPU nodes only) If you chose a GPU instance type and the Amazon EKS optimized accelerated AMI, you must apply the [NVIDIA device plugin for Kubernetes](https://github.com/NVIDIA/k8s-device-plugin) as a DaemonSet on your cluster. Replace *vX.X.X* with your desired [NVIDIA/k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin/releases) version before running the following command.

   ```
   kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/vX.X.X/deployments/static/nvidia-device-plugin.yml
   ```

 **Step 3: Additional actions** 

1. (Optional) Deploy a [sample application](sample-deployment.md) to test your cluster and Linux nodes.

1. (Optional) If the **AmazonEKS\$1CNI\$1Policy** managed IAM policy (if you have an `IPv4` cluster) or the *AmazonEKS\$1CNI\$1IPv6\$1Policy* (that you [created yourself](cni-iam-role.md#cni-iam-role-create-ipv6-policy) if you have an `IPv6` cluster) is attached to your [Amazon EKS node IAM role](create-node-role.md), we recommend assigning it to an IAM role that you associate to the Kubernetes `aws-node` service account instead. For more information, see [Configure Amazon VPC CNI plugin to use IRSA](cni-iam-role.md).

1. We recommend blocking Pod access to IMDS if the following conditions are true:
   + You plan to assign IAM roles to all of your Kubernetes service accounts so that Pods only have the minimum permissions that they need.
   + No Pods in the cluster require access to the Amazon EC2 instance metadata service (IMDS) for other reasons, such as retrieving the current AWS Region.

   For more information, see [Restrict access to the instance profile assigned to the worker node](https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node).

# Create self-managed Bottlerocket nodes
<a name="launch-node-bottlerocket"></a>

**Note**  
Managed node groups might offer some advantages for your use case. For more information, see [Simplify node lifecycle with managed node groups](managed-node-groups.md).

This topic describes how to launch Auto Scaling groups of [Bottlerocket](https://aws.amazon.com/bottlerocket/) nodes that register with your Amazon EKS cluster. Bottlerocket is a Linux-based open-source operating system from AWS that you can use for running containers on virtual machines or bare metal hosts. After the nodes join the cluster, you can deploy Kubernetes applications to them. For more information about Bottlerocket, see [Using a Bottlerocket AMI with Amazon EKS](https://github.com/bottlerocket-os/bottlerocket/blob/develop/QUICKSTART-EKS.md) on GitHub and [Custom AMI support](https://eksctl.io/usage/custom-ami-support/) in the `eksctl` documentation.

For information about in-place upgrades, see [Bottlerocket Update Operator](https://github.com/bottlerocket-os/bottlerocket-update-operator) on GitHub.

**Important**  
Amazon EKS nodes are standard Amazon EC2 instances, and you are billed for them based on normal Amazon EC2 instance prices. For more information, see [Amazon EC2 pricing](https://aws.amazon.com/ec2/pricing/).
You can launch Bottlerocket nodes in Amazon EKS extended clusters on AWS Outposts, but you can’t launch them in local clusters on AWS Outposts. For more information, see [Deploy Amazon EKS on-premises with AWS Outposts](eks-outposts.md).
You can deploy to Amazon EC2 instances with `x86` or Arm processors. However, you can’t deploy to instances that have Inferentia chips.
Bottlerocket is compatible with AWS CloudFormation. However, there is no official CloudFormation template that can be copied to deploy Bottlerocket nodes for Amazon EKS.
Bottlerocket images don’t come with an SSH server or a shell. You can use out-of-band access methods to allow SSH enabling the admin container and to pass some bootstrapping configuration steps with user data. For more information, see these sections in the [bottlerocket README.md](https://github.com/bottlerocket-os/bottlerocket) on GitHub:  
 [Exploration](https://github.com/bottlerocket-os/bottlerocket#exploration) 
 [Admin container](https://github.com/bottlerocket-os/bottlerocket#admin-container) 
 [Kubernetes settings](https://github.com/bottlerocket-os/bottlerocket#kubernetes-settings) 

This procedure requires `eksctl` version `0.215.0` or later. You can check your version with the following command:

```
eksctl version
```

For instructions on how to install or upgrade `eksctl`, see [Installation](https://eksctl.io/installation) in the `eksctl` documentation.NOTE: This procedure only works for clusters that were created with `eksctl`.

1. Copy the following contents to your device. Replace *my-cluster* with the name of your cluster. The name can contain only alphanumeric characters (case-sensitive) and hyphens. It must start with an alphanumeric character and can’t be longer than 100 characters. The name must be unique within the AWS Region and AWS account that you’re creating the cluster in. Replace *ng-bottlerocket* with a name for your node group. The node group name can’t be longer than 63 characters. It must start with letter or digit, but can also include hyphens and underscores for the remaining characters. To deploy on Arm instances, replace *m5.large* with an Arm instance type. Replace *my-ec2-keypair-name* with the name of an Amazon EC2 SSH key pair that you can use to connect using SSH into your nodes with after they launch. If you don’t already have an Amazon EC2 key pair, you can create one in the AWS Management Console. For more information, see [Amazon EC2 key pairs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*. Replace all remaining example values with your own values. Once you’ve made the replacements, run the modified command to create the `bottlerocket.yaml` file.

   If specifying an Arm Amazon EC2 instance type, then review the considerations in [Amazon EKS optimized Arm Amazon Linux AMIs](eks-optimized-ami.md#arm-ami) before deploying. For instructions on how to deploy using a custom AMI, see [Building Bottlerocket](https://github.com/bottlerocket-os/bottlerocket/blob/develop/BUILDING.md) on GitHub and [Custom AMI support](https://eksctl.io/usage/custom-ami-support/) in the `eksctl` documentation. To deploy a managed node group, deploy a custom AMI using a launch template. For more information, see [Customize managed nodes with launch templates](launch-templates.md).
**Important**  
To deploy a node group to AWS Outposts, AWS Wavelength, or AWS Local Zone subnets, don’t pass AWS Outposts, AWS Wavelength, or AWS Local Zone subnets when you create the cluster. You must specify the subnets in the following example. For more information see [Create a nodegroup from a config file](https://eksctl.io/usage/nodegroups/#creating-a-nodegroup-from-a-config-file) and [Config file schema](https://eksctl.io/usage/schema/) in the `eksctl` documentation. Replace *region-code* with the AWS Region that your cluster is in.

   ```
   cat >bottlerocket.yaml <<EOF
   ---
   apiVersion: eksctl.io/v1alpha5
   kind: ClusterConfig
   
   metadata:
     name: my-cluster
     region: region-code
     version: '1.35'
   
   iam:
     withOIDC: true
   
   nodeGroups:
     - name: ng-bottlerocket
       instanceType: m5.large
       desiredCapacity: 3
       amiFamily: Bottlerocket
       ami: auto-ssm
       iam:
          attachPolicyARNs:
             - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
             - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
             - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
             - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
       ssh:
           allow: true
           publicKeyName: my-ec2-keypair-name
   EOF
   ```

1. Deploy your nodes with the following command.

   ```
   eksctl create nodegroup --config-file=bottlerocket.yaml
   ```

   An example output is as follows.

   Several lines are output while the nodes are created. One of the last lines of output is the following example line.

   ```
   [✔]  created 1 nodegroup(s) in cluster "my-cluster"
   ```

1. (Optional) Create a Kubernetes [persistent volume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) on a Bottlerocket node using the [Amazon EBS CSI Plugin](https://github.com/kubernetes-sigs/aws-ebs-csi-driver). The default Amazon EBS driver relies on file system tools that aren’t included with Bottlerocket. For more information about creating a storage class using the driver, see [Use Kubernetes volume storage with Amazon EBS](ebs-csi.md).

1. (Optional) By default, `kube-proxy` sets the `nf_conntrack_max` kernel parameter to a default value that may differ from what Bottlerocket originally sets at boot. To keep Bottlerocket’s [default setting](https://github.com/bottlerocket-os/bottlerocket-core-kit/blob/develop/packages/release/release-sysctl.conf), edit the `kube-proxy` configuration with the following command.

   ```
   kubectl edit -n kube-system daemonset kube-proxy
   ```

   Add `--conntrack-max-per-core` and `--conntrack-min` to the `kube-proxy` arguments that are in the following example. A setting of `0` implies no change.

   ```
         containers:
         - command:
           - kube-proxy
           - --v=2
           - --config=/var/lib/kube-proxy-config/config
           - --conntrack-max-per-core=0
           - --conntrack-min=0
   ```

1. (Optional) Deploy a [sample application](sample-deployment.md) to test your Bottlerocket nodes.

1. We recommend blocking Pod access to IMDS if the following conditions are true:
   + You plan to assign IAM roles to all of your Kubernetes service accounts so that Pods only have the minimum permissions that they need.
   + No Pods in the cluster require access to the Amazon EC2 instance metadata service (IMDS) for other reasons, such as retrieving the current AWS Region.

   For more information, see [Restrict access to the instance profile assigned to the worker node](https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node).

# Create self-managed Microsoft Windows nodes
<a name="launch-windows-workers"></a>

This topic describes how to launch Auto Scaling groups of Windows nodes that register with your Amazon EKS cluster. After the nodes join the cluster, you can deploy Kubernetes applications to them.

**Important**  
Amazon EKS nodes are standard Amazon EC2 instances, and you are billed for them based on normal Amazon EC2 instance prices. For more information, see [Amazon EC2 pricing](https://aws.amazon.com/ec2/pricing/).
You can launch Windows nodes in Amazon EKS extended clusters on AWS Outposts, but you can’t launch them in local clusters on AWS Outposts. For more information, see [Deploy Amazon EKS on-premises with AWS Outposts](eks-outposts.md).

Enable Windows support for your cluster. We recommend that you review important considerations before you launch a Windows node group. For more information, see [Enable Windows support](windows-support.md#enable-windows-support).

You can launch self-managed Windows nodes with either of the following:
+  [`eksctl`](#eksctl_create_windows_nodes) 
+  [AWS Management Console](#console_create_windows_nodes) 

## `eksctl`
<a name="eksctl_create_windows_nodes"></a>

 **Launch self-managed Windows nodes using `eksctl` ** 

This procedure requires that you have installed `eksctl`, and that your `eksctl` version is at least `0.215.0`. You can check your version with the following command.

```
eksctl version
```

For instructions on how to install or upgrade `eksctl`, see [Installation](https://eksctl.io/installation) in the `eksctl` documentation.

**Note**  
This procedure only works for clusters that were created with `eksctl`.

1. (Optional) If the **AmazonEKS\$1CNI\$1Policy** managed IAM policy (if you have an `IPv4` cluster) or the *AmazonEKS\$1CNI\$1IPv6\$1Policy* (that you [created yourself](cni-iam-role.md#cni-iam-role-create-ipv6-policy) if you have an `IPv6` cluster) is attached to your [Amazon EKS node IAM role](create-node-role.md), we recommend assigning it to an IAM role that you associate to the Kubernetes `aws-node` service account instead. For more information, see [Configure Amazon VPC CNI plugin to use IRSA](cni-iam-role.md).

1. This procedure assumes that you have an existing cluster. If you don’t already have an Amazon EKS cluster and an Amazon Linux node group to add a Windows node group to, we recommend that you follow [Get started with Amazon EKS – `eksctl`](getting-started-eksctl.md). This guide provides a complete walkthrough for how to create an Amazon EKS cluster with Amazon Linux nodes.

   Create your node group with the following command. Replace *region-code* with the AWS Region that your cluster is in. Replace *my-cluster* with your cluster name. The name can contain only alphanumeric characters (case-sensitive) and hyphens. It must start with an alphanumeric character and can’t be longer than 100 characters. The name must be unique within the AWS Region and AWS account that you’re creating the cluster in. Replace *ng-windows* with a name for your node group. The node group name can’t be longer than 63 characters. It must start with letter or digit, but can also include hyphens and underscores for the remaining characters. You can replace *2019* with `2022` to use Windows Server 2022 or `2025` to use Windows Server 2025. Replace the rest of the example values with your own values.
**Important**  
To deploy a node group to AWS Outposts, AWS Wavelength, or AWS Local Zone subnets, don’t pass the AWS Outposts, Wavelength, or Local Zone subnets when you create the cluster. Create the node group with a config file, specifying the AWS Outposts, Wavelength, or Local Zone subnets. For more information, see [Create a nodegroup from a config file](https://eksctl.io/usage/nodegroups/#creating-a-nodegroup-from-a-config-file) and [Config file schema](https://eksctl.io/usage/schema/) in the `eksctl` documentation.

   ```
   eksctl create nodegroup \
       --region region-code \
       --cluster my-cluster \
       --name ng-windows \
       --node-type t2.large \
       --nodes 3 \
       --nodes-min 1 \
       --nodes-max 4 \
       --managed=false \
       --node-ami-family WindowsServer2019FullContainer
   ```
**Note**  
If nodes fail to join the cluster, see [Nodes fail to join cluster](troubleshooting.md#worker-node-fail) in the Troubleshooting guide.
To see the available options for `eksctl` commands, enter the following command.  

     ```
     eksctl command -help
     ```

   An example output is as follows. Several lines are output while the nodes are created. One of the last lines of output is the following example line.

   ```
   [✔]  created 1 nodegroup(s) in cluster "my-cluster"
   ```

1. (Optional) Deploy a [sample application](sample-deployment.md) to test your cluster and Windows nodes.

1. We recommend blocking Pod access to IMDS if the following conditions are true:
   + You plan to assign IAM roles to all of your Kubernetes service accounts so that Pods only have the minimum permissions that they need.
   + No Pods in the cluster require access to the Amazon EC2 instance metadata service (IMDS) for other reasons, such as retrieving the current AWS Region.

   For more information, see [Restrict access to the instance profile assigned to the worker node](https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node).

## AWS Management Console
<a name="console_create_windows_nodes"></a>

 **Prerequisites** 
+ An existing Amazon EKS cluster and a Linux node group. If you don’t have these resources, we recommend that you create them using one of our guides in [Get started with Amazon EKS](getting-started.md). These guides describe how to create an Amazon EKS cluster with Linux nodes.
+ An existing VPC and security group that meet the requirements for an Amazon EKS cluster. For more information, see [View Amazon EKS networking requirements for VPC and subnets](network-reqs.md) and [View Amazon EKS security group requirements for clusters](sec-group-reqs.md). The guides in [Get started with Amazon EKS](getting-started.md) create a VPC that meets the requirements. Alternatively, you can also follow [Create an Amazon VPC for your Amazon EKS cluster](creating-a-vpc.md) to create one manually.
+ An existing Amazon EKS cluster that uses a VPC and security group that meets the requirements of an Amazon EKS cluster. For more information, see [Create an Amazon EKS cluster](create-cluster.md). If you have subnets in the AWS Region where you have AWS Outposts, AWS Wavelength, or AWS Local Zones enabled, those subnets must not have been passed in when you created the cluster.

 **Step 1: Launch self-managed Windows nodes using the AWS Management Console ** 

1. Wait for your cluster status to show as `ACTIVE`. If you launch your nodes before the cluster is active, the nodes fail to register with the cluster and you need to relaunch them.

1. Open the [AWS CloudFormation console](https://console.aws.amazon.com/cloudformation/) 

1. Choose **Create stack**.

1. For **Specify template**, select **Amazon S3 URL**.

1. Copy the following URL and paste it into **Amazon S3 URL**.

   ```
   https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2023-02-09/amazon-eks-windows-nodegroup.yaml
   ```

1. Select **Next** twice.

1. On the **Quick create stack** page, enter the following parameters accordingly:
   +  **Stack name**: Choose a stack name for your AWS CloudFormation stack. For example, you can call it `my-cluster-nodes`.
   +  **ClusterName**: Enter the name that you used when you created your Amazon EKS cluster.
**Important**  
This name must exactly match the name that you used in [Step 1: Create your Amazon EKS cluster](getting-started-console.md#eks-create-cluster). Otherwise, your nodes can’t join the cluster.
   +  **ClusterControlPlaneSecurityGroup**: Choose the security group from the AWS CloudFormation output that you generated when you created your [VPC](creating-a-vpc.md). The following steps show one method to retrieve the applicable group.

     1. Open the [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

     1. Choose the name of the cluster.

     1. Choose the **Networking** tab.

     1. Use the **Additional security groups** value as a reference when selecting from the **ClusterControlPlaneSecurityGroup** dropdown list.
   +  **NodeGroupName**: Enter a name for your node group. This name can be used later to identify the Auto Scaling node group that’s created for your nodes. The node group name can’t be longer than 63 characters. It must start with letter or digit, but can also include hyphens and underscores for the remaining characters.
   +  **NodeAutoScalingGroupMinSize**: Enter the minimum number of nodes that your node Auto Scaling group can scale in to.
   +  **NodeAutoScalingGroupDesiredCapacity**: Enter the desired number of nodes to scale to when your stack is created.
   +  **NodeAutoScalingGroupMaxSize**: Enter the maximum number of nodes that your node Auto Scaling group can scale out to.
   +  **NodeInstanceType**: Choose an instance type for your nodes. For more information, see [Choose an optimal Amazon EC2 node instance type](choosing-instance-type.md).
**Note**  
The supported instance types for the latest version of the [Amazon VPC CNI plugin for Kubernetes](https://github.com/aws/amazon-vpc-cni-k8s) are listed in [vpc\$1ip\$1resource\$1limit.go](https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/vpc/vpc_ip_resource_limit.go) on GitHub. You might need to update your CNI version to use the latest supported instance types. For more information, see [Assign IPs to Pods with the Amazon VPC CNI](managing-vpc-cni.md).
   +  **NodeImageIdSSMParam**: Pre-populated with the Amazon EC2 Systems Manager parameter of the current recommended Amazon EKS optimized Windows Core AMI ID. To use the full version of Windows, replace *Core* with `Full`.
   +  **NodeImageId**: (Optional) If you’re using your own custom AMI (instead of an Amazon EKS optimized AMI), enter a node AMI ID for your AWS Region. If you specify a value for this field, it overrides any values in the **NodeImageIdSSMParam** field.
   +  **NodeVolumeSize**: Specify a root volume size for your nodes, in GiB.
   +  **KeyName**: Enter the name of an Amazon EC2 SSH key pair that you can use to connect using SSH into your nodes with after they launch. If you don’t already have an Amazon EC2 key pair, you can create one in the AWS Management Console. For more information, see [Amazon EC2 key pairs](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*.
**Note**  
If you don’t provide a key pair here, the AWS CloudFormation stack fails to be created.
   +  **BootstrapArguments**: Specify any optional arguments to pass to the node bootstrap script, such as extra `kubelet` arguments using `-KubeletExtraArgs`.
   +  **DisableIMDSv1**: By default, each node supports the Instance Metadata Service Version 1 (IMDSv1) and IMDSv2. You can disable IMDSv1. To prevent future nodes and Pods in the node group from using MDSv1, set **DisableIMDSv1** to **true**. For more information about IMDS, see [Configuring the instance metadata service](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html).
   +  **VpcId**: Select the ID for the [VPC](creating-a-vpc.md) that you created.
   +  **NodeSecurityGroups**: Select the security group that was created for your Linux node group when you created your [VPC](creating-a-vpc.md). If your Linux nodes have more than one security group attached to them, specify all of them. This for, for example, if the Linux node group was created with `eksctl`.
   +  **Subnets**: Choose the subnets that you created. If you created your VPC using the steps in [Create an Amazon VPC for your Amazon EKS cluster](creating-a-vpc.md), then specify only the private subnets within the VPC for your nodes to launch into.
**Important**  
If any of the subnets are public subnets, then they must have the automatic public IP address assignment setting enabled. If the setting isn’t enabled for the public subnet, then any nodes that you deploy to that public subnet won’t be assigned a public IP address and won’t be able to communicate with the cluster or other AWS services. If the subnet was deployed before March 26, 2020 using either of the [Amazon EKS AWS CloudFormation VPC templates](creating-a-vpc.md), or by using `eksctl`, then automatic public IP address assignment is disabled for public subnets. For information about how to enable public IP address assignment for a subnet, see [Modifying the public IPv4 addressing attribute for your subnet](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-ip-addressing.html#subnet-public-ip). If the node is deployed to a private subnet, then it’s able to communicate with the cluster and other AWS services through a NAT gateway.
If the subnets don’t have internet access, then make sure that you’re aware of the considerations and extra steps in [Deploy private clusters with limited internet access](private-clusters.md).
If you select AWS Outposts, Wavelength, or Local Zone subnets, then the subnets must not have been passed in when you created the cluster.

1. Acknowledge that the stack might create IAM resources, and then choose **Create stack**.

1. When your stack has finished creating, select it in the console and choose **Outputs**.

1. Record the **NodeInstanceRole** for the node group that was created. You need this when you configure your Amazon EKS Windows nodes.

 **Step 2: Enable nodes to join your cluster** 

1. Check to see if you already have an `aws-auth` `ConfigMap`.

   ```
   kubectl describe configmap -n kube-system aws-auth
   ```

1. If you are shown an `aws-auth` `ConfigMap`, then update it as needed.

   1. Open the `ConfigMap` for editing.

      ```
      kubectl edit -n kube-system configmap/aws-auth
      ```

   1. Add new `mapRoles` entries as needed. Set the `rolearn` values to the **NodeInstanceRole** values that you recorded in the previous procedures.

      ```
      [...]
      data:
        mapRoles: |
      - rolearn: <ARN of linux instance role (not instance profile)>
            username: system:node:{{EC2PrivateDNSName}}
            groups:
              - system:bootstrappers
              - system:nodes
          - rolearn: <ARN of windows instance role (not instance profile)>
            username: system:node:{{EC2PrivateDNSName}}
            groups:
              - system:bootstrappers
              - system:nodes
              - eks:kube-proxy-windows
      [...]
      ```

   1. Save the file and exit your text editor.

1. If you received an error stating "`Error from server (NotFound): configmaps "aws-auth" not found`, then apply the stock `ConfigMap`.

   1. Download the configuration map.

      ```
      curl -O https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/aws-auth-cm-windows.yaml
      ```

   1. In the `aws-auth-cm-windows.yaml` file, set the `rolearn` values to the applicable **NodeInstanceRole** values that you recorded in the previous procedures. You can do this with a text editor, or by replacing the example values and running the following command:

      ```
      sed -i.bak -e 's|<ARN of linux instance role (not instance profile)>|my-node-linux-instance-role|' \
          -e 's|<ARN of windows instance role (not instance profile)>|my-node-windows-instance-role|' aws-auth-cm-windows.yaml
      ```
**Important**  
Don’t modify any other lines in this file.
Don’t use the same IAM role for both Windows and Linux nodes.

   1. Apply the configuration. This command might take a few minutes to finish.

      ```
      kubectl apply -f aws-auth-cm-windows.yaml
      ```

1. Watch the status of your nodes and wait for them to reach the `Ready` status.

   ```
   kubectl get nodes --watch
   ```

   Enter `Ctrl`\$1`C` to return to a shell prompt.
**Note**  
If you receive any authorization or resource type errors, see [Unauthorized or access denied (`kubectl`)](troubleshooting.md#unauthorized) in the troubleshooting topic.

   If nodes fail to join the cluster, then see [Nodes fail to join cluster](troubleshooting.md#worker-node-fail) in the Troubleshooting chapter.

 **Step 3: Additional actions** 

1. (Optional) Deploy a [sample application](sample-deployment.md) to test your cluster and Windows nodes.

1. (Optional) If the **AmazonEKS\$1CNI\$1Policy** managed IAM policy (if you have an `IPv4` cluster) or the *AmazonEKS\$1CNI\$1IPv6\$1Policy* (that you [created yourself](cni-iam-role.md#cni-iam-role-create-ipv6-policy) if you have an `IPv6` cluster) is attached to your [Amazon EKS node IAM role](create-node-role.md), we recommend assigning it to an IAM role that you associate to the Kubernetes `aws-node` service account instead. For more information, see [Configure Amazon VPC CNI plugin to use IRSA](cni-iam-role.md).

1. We recommend blocking Pod access to IMDS if the following conditions are true:
   + You plan to assign IAM roles to all of your Kubernetes service accounts so that Pods only have the minimum permissions that they need.
   + No Pods in the cluster require access to the Amazon EC2 instance metadata service (IMDS) for other reasons, such as retrieving the current AWS Region.

   For more information, see [Restrict access to the instance profile assigned to the worker node](https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node).

# Create self-managed Ubuntu Linux nodes
<a name="launch-node-ubuntu"></a>

**Note**  
Managed node groups might offer some advantages for your use case. For more information, see [Simplify node lifecycle with managed node groups](managed-node-groups.md).

This topic describes how to launch Auto Scaling groups of [Ubuntu on Amazon Elastic Kubernetes Service (EKS)](https://cloud-images.ubuntu.com/aws-eks/) or [Ubuntu Pro on Amazon Elastic Kubernetes Service (EKS)](https://ubuntu.com/blog/ubuntu-pro-for-eks-is-now-generally-available) nodes that register with your Amazon EKS cluster. Ubuntu and Ubuntu Pro for EKS are based on the official Ubuntu Minimal LTS, include the custom AWS kernel that is jointly developed with AWS, and have been built specifically for EKS. Ubuntu Pro adds additional security coverage by supporting EKS extended support periods, kernel livepatch, FIPS compliance and the ability to run unlimited Pro containers.

After the nodes join the cluster, you can deploy containerized applications to them. For more information, visit the documentation for [Ubuntu on AWS](https://documentation.ubuntu.com/aws/en/latest/) and [Custom AMI support](https://eksctl.io/usage/custom-ami-support/) in the `eksctl` documentation.

**Important**  
Amazon EKS nodes are standard Amazon EC2 instances, and you are billed for them based on normal Amazon EC2 instance prices. For more information, see [Amazon EC2 pricing](https://aws.amazon.com/ec2/pricing/).
You can launch Ubuntu nodes in Amazon EKS extended clusters on AWS Outposts, but you can’t launch them in local clusters on AWS Outposts. For more information, see [Deploy Amazon EKS on-premises with AWS Outposts](eks-outposts.md).
You can deploy to Amazon EC2 instances with `x86` or Arm processors. However, instances that have Inferentia chips might need to install the [Neuron SDK](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) first.

This procedure requires `eksctl` version `0.215.0` or later. You can check your version with the following command:

```
eksctl version
```

For instructions on how to install or upgrade `eksctl`, see [Installation](https://eksctl.io/installation) in the `eksctl` documentation.NOTE: This procedure only works for clusters that were created with `eksctl`.

1. Copy the following contents to your device. Replace `my-cluster` with the name of your cluster. The name can contain only alphanumeric characters (case-sensitive) and hyphens. It must start with an alphabetic character and can’t be longer than 100 characters. Replace `ng-ubuntu` with a name for your node group. The node group name can’t be longer than 63 characters. It must start with letter or digit, but can also include hyphens and underscores for the remaining characters. To deploy on Arm instances, replace `m5.large` with an Arm instance type. Replace `my-ec2-keypair-name` with the name of an Amazon EC2 SSH key pair that you can use to connect using SSH into your nodes with after they launch. If you don’t already have an Amazon EC2 key pair, you can create one in the AWS Management Console. For more information, see [Amazon EC2 key pairs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) in the Amazon EC2 User Guide. Replace all remaining example values with your own values. Once you’ve made the replacements, run the modified command to create the `ubuntu.yaml` file.
**Important**  
To deploy a node group to AWS Outposts, AWS Wavelength, or AWS Local Zone subnets, don’t pass AWS Outposts, AWS Wavelength, or AWS Local Zone subnets when you create the cluster. You must specify the subnets in the following example. For more information see [Create a nodegroup from a config file](https://eksctl.io/usage/nodegroups/#creating-a-nodegroup-from-a-config-file) and [Config file schema](https://eksctl.io/usage/schema/) in the `eksctl` documentation. Replace *region-code* with the AWS Region that your cluster is in.

   ```
   cat >ubuntu.yaml <<EOF
   ---
   apiVersion: eksctl.io/v1alpha5
   kind: ClusterConfig
   
   metadata:
     name: my-cluster
     region: region-code
     version: '1.35'
   
   iam:
     withOIDC: true
   
   nodeGroups:
     - name: ng-ubuntu
       instanceType: m5.large
       desiredCapacity: 3
       amiFamily: Ubuntu2204
       iam:
          attachPolicyARNs:
             - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
             - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
             - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
             - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
       ssh:
           allow: true
           publicKeyName: my-ec2-keypair-name
   EOF
   ```

   To create an Ubuntu Pro node group, just change the `amiFamily` value to `UbuntuPro2204`.

1. Deploy your nodes with the following command.

   ```
   eksctl create nodegroup --config-file=ubuntu.yaml
   ```

   An example output is as follows.

   Several lines are output while the nodes are created. One of the last lines of output is the following example line.

   ```
   [✔]  created 1 nodegroup(s) in cluster "my-cluster"
   ```

1. (Optional) Deploy a [sample application](sample-deployment.md) to test your Ubuntu nodes.

1. We recommend blocking Pod access to IMDS if the following conditions are true:
   + You plan to assign IAM roles to all of your Kubernetes service accounts so that Pods only have the minimum permissions that they need.
   + No Pods in the cluster require access to the Amazon EC2 instance metadata service (IMDS) for other reasons, such as retrieving the current AWS Region.

   For more information, see [Restrict access to the instance profile assigned to the worker node](https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node).

# Update self-managed nodes for your cluster
<a name="update-workers"></a>

When a new Amazon EKS optimized AMI is released, consider replacing the nodes in your self-managed node group with the new AMI. Likewise, if you have updated the Kubernetes version for your Amazon EKS cluster, update the nodes to use nodes with the same Kubernetes version.

**Important**  
This topic covers node updates for self-managed nodes. If you are using [managed node groups](managed-node-groups.md), see [Update a managed node group for your cluster](update-managed-node-group.md).

There are two basic ways to update self-managed node groups in your clusters to use a new AMI:

 ** [Migrate applications to a new node group](migrate-stack.md) **   
Create a new node group and migrate your Pods to that group. Migrating to a new node group is more graceful than simply updating the AMI ID in an existing AWS CloudFormation stack. This is because the migration process [taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) the old node group as `NoSchedule` and drains the nodes after a new stack is ready to accept the existing Pod workload.

 ** [Update an AWS CloudFormation node stack](update-stack.md) **   
Update the AWS CloudFormation stack for an existing node group to use the new AMI. This method isn’t supported for node groups that were created with `eksctl`.

# Migrate applications to a new node group
<a name="migrate-stack"></a>

This topic describes how you can create a new node group, gracefully migrate your existing applications to the new group, and remove the old node group from your cluster. You can migrate to a new node group using `eksctl` or the AWS Management Console.
+  [`eksctl`](#eksctl_migrate_apps) 
+  [AWS Management Console and AWS CLI](#console_migrate_apps) 

## `eksctl`
<a name="eksctl_migrate_apps"></a>

 **Migrate your applications to a new node group with `eksctl` ** 

For more information on using eksctl for migration, see [Unmanaged nodegroups](https://eksctl.io/usage/nodegroup-unmanaged/) in the `eksctl` documentation.

This procedure requires `eksctl` version `0.215.0` or later. You can check your version with the following command:

```
eksctl version
```

For instructions on how to install or upgrade `eksctl`, see [Installation](https://eksctl.io/installation) in the `eksctl` documentation.

**Note**  
This procedure only works for clusters and node groups that were created with `eksctl`.

1. Retrieve the name of your existing node groups, replacing *my-cluster* with your cluster name.

   ```
   eksctl get nodegroups --cluster=my-cluster
   ```

   An example output is as follows.

   ```
   CLUSTER      NODEGROUP          CREATED               MIN SIZE      MAX SIZE     DESIRED CAPACITY     INSTANCE TYPE     IMAGE ID
   default      standard-nodes   2019-05-01T22:26:58Z  1             4            3                    t3.medium         ami-05a71d034119ffc12
   ```

1. Launch a new node group with `eksctl` with the following command. In the command, replace every *example value* with your own values. The version number can’t be later than the Kubernetes version for your control plane. Also, it can’t be more than two minor versions earlier than the Kubernetes version for your control plane. We recommend that you use the same version as your control plane.

   We recommend blocking Pod access to IMDS if the following conditions are true:
   + You plan to assign IAM roles to all of your Kubernetes service accounts so that Pods only have the minimum permissions that they need.
   + No Pods in the cluster require access to the Amazon EC2 instance metadata service (IMDS) for other reasons, such as retrieving the current AWS Region.

     For more information, see [Restrict access to the instance profile assigned to the worker node](https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node).

     To block Pod access to IMDS, add the `--disable-pod-imds` option to the following command.
**Note**  
For more available flags and their descriptions, see https://eksctl.io/.

   ```
   eksctl create nodegroup \
     --cluster my-cluster \
     --version 1.35 \
     --name standard-nodes-new \
     --node-type t3.medium \
     --nodes 3 \
     --nodes-min 1 \
     --nodes-max 4 \
     --managed=false
   ```

1. When the previous command completes, verify that all of your nodes have reached the `Ready` state with the following command:

   ```
   kubectl get nodes
   ```

1. Delete the original node group with the following command. In the command, replace every *example value* with your cluster and node group names:

   ```
   eksctl delete nodegroup --cluster my-cluster --name standard-nodes-old
   ```

## AWS Management Console and AWS CLI
<a name="console_migrate_apps"></a>

 **Migrate your applications to a new node group with the AWS Management Console and AWS CLI** 

1. Launch a new node group by following the steps that are outlined in [Create self-managed Amazon Linux nodes](launch-workers.md).

1. When your stack has finished creating, select it in the console and choose **Outputs**.

1.  Record the **NodeInstanceRole** for the node group that was created. You need this to add the new Amazon EKS nodes to your cluster.
**Note**  
If you attached any additional IAM policies to your old node group IAM role, attach those same policies to your new node group IAM role to maintain that functionality on the new group. This applies to you if you added permissions for the [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler), for example.

1. Update the security groups for both node groups so that they can communicate with each other. For more information, see [View Amazon EKS security group requirements for clusters](sec-group-reqs.md).

   1. Record the security group IDs for both node groups. This is shown as the **NodeSecurityGroup** value in the AWS CloudFormation stack outputs.

      You can use the following AWS CLI commands to get the security group IDs from the stack names. In these commands, `oldNodes` is the AWS CloudFormation stack name for your older node stack, and `newNodes` is the name of the stack that you are migrating to. Replace every *example value* with your own values.

      ```
      oldNodes="old_node_CFN_stack_name"
      newNodes="new_node_CFN_stack_name"
      
      oldSecGroup=$(aws cloudformation describe-stack-resources --stack-name $oldNodes \
      --query 'StackResources[?ResourceType==`AWS::EC2::SecurityGroup`].PhysicalResourceId' \
      --output text)
      newSecGroup=$(aws cloudformation describe-stack-resources --stack-name $newNodes \
      --query 'StackResources[?ResourceType==`AWS::EC2::SecurityGroup`].PhysicalResourceId' \
      --output text)
      ```

   1. Add ingress rules to each node security group so that they accept traffic from each other.

      The following AWS CLI commands add inbound rules to each security group that allow all traffic on all protocols from the other security group. This configuration allows Pods in each node group to communicate with each other while you’re migrating your workload to the new group.

      ```
      aws ec2 authorize-security-group-ingress --group-id $oldSecGroup \
      --source-group $newSecGroup --protocol -1
      aws ec2 authorize-security-group-ingress --group-id $newSecGroup \
      --source-group $oldSecGroup --protocol -1
      ```

1. Edit the `aws-auth` configmap to map the new node instance role in RBAC.

   ```
   kubectl edit configmap -n kube-system aws-auth
   ```

   Add a new `mapRoles` entry for the new node group.

   ```
   apiVersion: v1
   data:
     mapRoles: |
       - rolearn: ARN of instance role (not instance profile)
         username: system:node:{{EC2PrivateDNSName}}
         groups:
           - system:bootstrappers
           - system:nodes>
       - rolearn: arn:aws:iam::111122223333:role/nodes-1-16-NodeInstanceRole-U11V27W93CX5
         username: system:node:{{EC2PrivateDNSName}}
         groups:
           - system:bootstrappers
           - system:nodes
   ```

   Replace the *ARN of instance role (not instance profile)* snippet with the **NodeInstanceRole** value that you recorded in a [previous step](#node-instance-role-step). Then, save and close the file to apply the updated configmap.

1. Watch the status of your nodes and wait for your new nodes to join your cluster and reach the `Ready` status.

   ```
   kubectl get nodes --watch
   ```

1. (Optional) If you’re using the [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler), scale the deployment down to zero (0) replicas to avoid conflicting scaling actions.

   ```
   kubectl scale deployments/cluster-autoscaler --replicas=0 -n kube-system
   ```

1. Use the following command to taint each of the nodes that you want to remove with `NoSchedule`. This is so that new Pods aren’t scheduled or rescheduled on the nodes that you’re replacing. For more information, see [Taints and Tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) in the Kubernetes documentation.

   ```
   kubectl taint nodes node_name key=value:NoSchedule
   ```

   If you’re upgrading your nodes to a new Kubernetes version, you can identify and taint all of the nodes of a particular Kubernetes version (in this case, `1.33`) with the following code snippet. The version number can’t be later than the Kubernetes version of your control plane. It also can’t be more than two minor versions earlier than the Kubernetes version of your control plane. We recommend that you use the same version as your control plane.

   ```
   K8S_VERSION=1.33
   nodes=$(kubectl get nodes -o jsonpath="{.items[?(@.status.nodeInfo.kubeletVersion==\"v$K8S_VERSION\")].metadata.name}")
   for node in ${nodes[@]}
   do
       echo "Tainting $node"
       kubectl taint nodes $node key=value:NoSchedule
   done
   ```

1.  Determine your cluster’s DNS provider.

   ```
   kubectl get deployments -l k8s-app=kube-dns -n kube-system
   ```

   An example output is as follows. This cluster is using CoreDNS for DNS resolution, but your cluster can return `kube-dns` instead):

   ```
   NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
   coredns   1         1         1            1           31m
   ```

1. If your current deployment is running fewer than two replicas, scale out the deployment to two replicas. Replace *coredns* with `kubedns` if your previous command output returned that instead.

   ```
   kubectl scale deployments/coredns --replicas=2 -n kube-system
   ```

1. Drain each of the nodes that you want to remove from your cluster with the following command:

   ```
   kubectl drain node_name --ignore-daemonsets --delete-local-data
   ```

   If you’re upgrading your nodes to a new Kubernetes version, identify and drain all of the nodes of a particular Kubernetes version (in this case, *1.33*) with the following code snippet.

   ```
   K8S_VERSION=1.33
   nodes=$(kubectl get nodes -o jsonpath="{.items[?(@.status.nodeInfo.kubeletVersion==\"v$K8S_VERSION\")].metadata.name}")
   for node in ${nodes[@]}
   do
       echo "Draining $node"
       kubectl drain $node --ignore-daemonsets --delete-local-data
   done
   ```

1. After your old nodes finished draining, revoke the security group inbound rules you authorized earlier. Then, delete the AWS CloudFormation stack to terminate the instances.
**Note**  
If you attached any additional IAM policies to your old node group IAM role, such as adding permissions for the [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler), detach those additional policies from the role before you can delete your AWS CloudFormation stack.

   1. Revoke the inbound rules that you created for your node security groups earlier. In these commands, `oldNodes` is the AWS CloudFormation stack name for your older node stack, and `newNodes` is the name of the stack that you are migrating to.

      ```
      oldNodes="old_node_CFN_stack_name"
      newNodes="new_node_CFN_stack_name"
      
      oldSecGroup=$(aws cloudformation describe-stack-resources --stack-name $oldNodes \
      --query 'StackResources[?ResourceType==`AWS::EC2::SecurityGroup`].PhysicalResourceId' \
      --output text)
      newSecGroup=$(aws cloudformation describe-stack-resources --stack-name $newNodes \
      --query 'StackResources[?ResourceType==`AWS::EC2::SecurityGroup`].PhysicalResourceId' \
      --output text)
      aws ec2 revoke-security-group-ingress --group-id $oldSecGroup \
      --source-group $newSecGroup --protocol -1
      aws ec2 revoke-security-group-ingress --group-id $newSecGroup \
      --source-group $oldSecGroup --protocol -1
      ```

   1. Open the [AWS CloudFormation console](https://console.aws.amazon.com/cloudformation/).

   1. Select your old node stack.

   1. Choose **Delete**.

   1. In the **Delete stack** confirmation dialog box, choose **Delete stack**.

1. Edit the `aws-auth` configmap to remove the old node instance role from RBAC.

   ```
   kubectl edit configmap -n kube-system aws-auth
   ```

   Delete the `mapRoles` entry for the old node group.

   ```
   apiVersion: v1
   data:
     mapRoles: |
       - rolearn: arn:aws:iam::111122223333:role/nodes-1-16-NodeInstanceRole-W70725MZQFF8
         username: system:node:{{EC2PrivateDNSName}}
         groups:
           - system:bootstrappers
           - system:nodes
       - rolearn: arn:aws:iam::111122223333:role/nodes-1-15-NodeInstanceRole-U11V27W93CX5
         username: system:node:{{EC2PrivateDNSName}}
         groups:
           - system:bootstrappers
           - system:nodes>
   ```

   Save and close the file to apply the updated configmap.

1. (Optional) If you are using the [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler), scale the deployment back to one replica.
**Note**  
You must also tag your new Auto Scaling group appropriately (for example, `k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster`) and update the command for your Cluster Autoscaler deployment to point to the newly tagged Auto Scaling group. For more information, see [Cluster Autoscaler on AWS](https://github.com/kubernetes/autoscaler/tree/cluster-autoscaler-release-1.3/cluster-autoscaler/cloudprovider/aws).

   ```
   kubectl scale deployments/cluster-autoscaler --replicas=1 -n kube-system
   ```

1. (Optional) Verify that you’re using the latest version of the [Amazon VPC CNI plugin for Kubernetes](https://github.com/aws/amazon-vpc-cni-k8s). You might need to update your CNI version to use the latest supported instance types. For more information, see [Assign IPs to Pods with the Amazon VPC CNI](managing-vpc-cni.md).

1. If your cluster is using `kube-dns` for DNS resolution (see [[migrate-determine-dns-step]](#migrate-determine-dns-step)), scale in the `kube-dns` deployment to one replica.

   ```
   kubectl scale deployments/kube-dns --replicas=1 -n kube-system
   ```

# Update an AWS CloudFormation node stack
<a name="update-stack"></a>

This topic describes how you can update an existing AWS CloudFormation self-managed node stack with a new AMI. You can use this procedure to update your nodes to a new version of Kubernetes following a cluster update. Otherwise, you can update to the latest Amazon EKS optimized AMI for an existing Kubernetes version.

**Important**  
This topic covers node updates for self-managed nodes. For information about using [Simplify node lifecycle with managed node groups](managed-node-groups.md), see [Update a managed node group for your cluster](update-managed-node-group.md).

The latest default Amazon EKS node AWS CloudFormation template is configured to launch an instance with the new AMI into your cluster before removing an old one, one at a time. This configuration ensures that you always have your Auto Scaling group’s desired count of active instances in your cluster during the rolling update.

**Note**  
This method isn’t supported for node groups that were created with `eksctl`. If you created your cluster or node group with `eksctl`, see [Migrate applications to a new node group](migrate-stack.md).

1. Determine the DNS provider for your cluster.

   ```
   kubectl get deployments -l k8s-app=kube-dns -n kube-system
   ```

   An example output is as follows. This cluster is using CoreDNS for DNS resolution, but your cluster might return `kube-dns` instead. Your output might look different depending on the version of `kubectl` that you’re using.

   ```
   NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
   coredns   1         1         1            1           31m
   ```

1. If your current deployment is running fewer than two replicas, scale out the deployment to two replicas. Replace *coredns* with `kube-dns` if your previous command output returned that instead.

   ```
   kubectl scale deployments/coredns --replicas=2 -n kube-system
   ```

1. (Optional) If you’re using the Kubernetes [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md), scale the deployment down to zero (0) replicas to avoid conflicting scaling actions.

   ```
   kubectl scale deployments/cluster-autoscaler --replicas=0 -n kube-system
   ```

1.  Determine the instance type and desired instance count of your current node group. You enter these values later when you update the AWS CloudFormation template for the group.

   1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

   1. In the left navigation pane, choose **Launch Configurations**, and note the instance type for your existing node launch configuration.

   1. In the left navigation pane, choose **Auto Scaling Groups**, and note the **Desired** instance count for your existing node Auto Scaling group.

1. Open the [AWS CloudFormation console](https://console.aws.amazon.com/cloudformation/).

1. Select your node group stack, and then choose **Update**.

1. Select **Replace current template** and select **Amazon S3 URL**.

1. For **Amazon S3 URL**, paste the following URL into the text area to ensure that you’re using the latest version of the node AWS CloudFormation template. Then, choose **Next**:

   ```
   https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2022-12-23/amazon-eks-nodegroup.yaml
   ```

1. On the **Specify stack details** page, fill out the following parameters, and choose **Next**:
   +  **NodeAutoScalingGroupDesiredCapacity** – Enter the desired instance count that you recorded in a [previous step](#existing-worker-settings-step). Or, enter your new desired number of nodes to scale to when your stack is updated.
   +  **NodeAutoScalingGroupMaxSize** – Enter the maximum number of nodes to which your node Auto Scaling group can scale out. This value must be at least one node more than your desired capacity. This is so that you can perform a rolling update of your nodes without reducing your node count during the update.
   +  **NodeInstanceType** – Choose the instance type your recorded in a [previous step](#existing-worker-settings-step). Alternatively, choose a different instance type for your nodes. Before choosing a different instance type, review [Choose an optimal Amazon EC2 node instance type](choosing-instance-type.md). Each Amazon EC2 instance type supports a maximum number of elastic network interfaces (network interface) and each network interface supports a maximum number of IP addresses. Because each worker node and Pod ,is assigned its own IP address, it’s important to choose an instance type that will support the maximum number of Pods that you want to run on each Amazon EC2 node. For a list of the number of network interfaces and IP addresses supported by instance types, see [IP addresses per network interface per instance type](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI). For example, the `m5.large` instance type supports a maximum of 30 IP addresses for the worker node and Pods.
**Note**  
The supported instance types for the latest version of the [Amazon VPC CNI plugin for Kubernetes](https://github.com/aws/amazon-vpc-cni-k8s) are shown in [vpc\$1ip\$1resource\$1limit.go](https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/vpc/vpc_ip_resource_limit.go) on GitHub. You might need to update your Amazon VPC CNI plugin for Kubernetes version to use the latest supported instance types. For more information, see [Assign IPs to Pods with the Amazon VPC CNI](managing-vpc-cni.md).
**Important**  
Some instance types might not be available in all AWS Regions.
   +  **NodeImageIdSSMParam** – The Amazon EC2 Systems Manager parameter of the AMI ID that you want to update to. The following value uses the latest Amazon EKS optimized AMI for Kubernetes version `1.35`.

     ```
     /aws/service/eks/optimized-ami/1.35/amazon-linux-2/recommended/image_id
     ```

     You can replace *1.35* with a [platform-version](https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html) that’s the same. Or, it should be up to one version earlier than the Kubernetes version running on your control plane. We recommend that you keep your nodes at the same version as your control plane. You can also replace *amazon-linux-2* with a different AMI type. For more information, see [Retrieve recommended Amazon Linux AMI IDs](retrieve-ami-id.md).
**Note**  
Using the Amazon EC2 Systems Manager parameter enables you to update your nodes in the future without having to look up and specify an AMI ID. If your AWS CloudFormation stack is using this value, any stack update always launches the latest recommended Amazon EKS optimized AMI for your specified Kubernetes version. This is even the case even if you don’t change any values in the template.
   +  **NodeImageId** – To use your own custom AMI, enter the ID for the AMI to use.
**Important**  
This value overrides any value specified for **NodeImageIdSSMParam**. If you want to use the **NodeImageIdSSMParam** value, ensure that the value for **NodeImageId** is blank.
   +  **DisableIMDSv1** – By default, each node supports the Instance Metadata Service Version 1 (IMDSv1) and IMDSv2. However, you can disable IMDSv1. Select **true** if you don’t want any nodes or any Pods scheduled in the node group to use IMDSv1. For more information about IMDS, see [Configuring the instance metadata service](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html). If you’ve implemented IAM roles for service accounts, assign necessary permissions directly to all Pods that require access to AWS services. This way, no Pods in your cluster require access to IMDS for other reasons, such as retrieving the current AWS Region. Then, you can also disable access to IMDSv2 for Pods that don’t use host networking. For more information, see [Restrict access to the instance profile assigned to the worker node](https://aws.github.io/aws-eks-best-practices/security/docs/iam/#restrict-access-to-the-instance-profile-assigned-to-the-worker-node).

1. (Optional) On the **Options** page, tag your stack resources. Choose **Next**.

1. On the **Review** page, review your information, acknowledge that the stack might create IAM resources, and then choose **Update stack**.
**Note**  
The update of each node in the cluster takes several minutes. Wait for the update of all nodes to complete before performing the next steps.

1. If your cluster’s DNS provider is `kube-dns`, scale in the `kube-dns` deployment to one replica.

   ```
   kubectl scale deployments/kube-dns --replicas=1 -n kube-system
   ```

1. (Optional) If you are using the Kubernetes [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md), scale the deployment back to your desired amount of replicas.

   ```
   kubectl scale deployments/cluster-autoscaler --replicas=1 -n kube-system
   ```

1. (Optional) Verify that you’re using the latest version of the [Amazon VPC CNI plugin for Kubernetes](https://github.com/aws/amazon-vpc-cni-k8s). You might need to update your Amazon VPC CNI plugin for Kubernetes version to use the latest supported instance types. For more information, see [Assign IPs to Pods with the Amazon VPC CNI](managing-vpc-cni.md).

# Simplify compute management with AWS Fargate
<a name="fargate"></a>

This topic discusses using Amazon EKS to run Kubernetes Pods on AWS Fargate. Fargate is a technology that provides on-demand, right-sized compute capacity for [containers](https://aws.amazon.com/what-are-containers). With Fargate, you don’t have to provision, configure, or scale groups of virtual machines on your own to run containers. You also don’t need to choose server types, decide when to scale your node groups, or optimize cluster packing.

You can control which Pods start on Fargate and how they run with [Fargate profiles](fargate-profile.md). Fargate profiles are defined as part of your Amazon EKS cluster. Amazon EKS integrates Kubernetes with Fargate by using controllers that are built by AWS using the upstream, extensible model provided by Kubernetes. These controllers run as part of the Amazon EKS managed Kubernetes control plane and are responsible for scheduling native Kubernetes Pods onto Fargate. The Fargate controllers include a new scheduler that runs alongside the default Kubernetes scheduler in addition to several mutating and validating admission controllers. When you start a Pod that meets the criteria for running on Fargate, the Fargate controllers that are running in the cluster recognize, update, and schedule the Pod onto Fargate.

This topic describes the different components of Pods that run on Fargate, and calls out special considerations for using Fargate with Amazon EKS.

## AWS Fargate considerations
<a name="fargate-considerations"></a>

Here are some things to consider about using Fargate on Amazon EKS.
+ Each Pod that runs on Fargate has its own compute boundary. They don’t share the underlying kernel, CPU resources, memory resources, or elastic network interface with another Pod.
+ Network Load Balancers and Application Load Balancers (ALBs) can be used with Fargate with IP targets only. For more information, see [Create a network load balancer](network-load-balancing.md#network-load-balancer) and [Route application and HTTP traffic with Application Load Balancers](alb-ingress.md).
+ Fargate exposed services only run on target type IP mode, and not on node IP mode. The recommended way to check the connectivity from a service running on a managed node and a service running on Fargate is to connect via service name.
+ Pods must match a Fargate profile at the time that they’re scheduled to run on Fargate. Pods that don’t match a Fargate profile might be stuck as `Pending`. If a matching Fargate profile exists, you can delete pending Pods that you have created to reschedule them onto Fargate.
+ Daemonsets aren’t supported on Fargate. If your application requires a daemon, reconfigure that daemon to run as a sidecar container in your Pods.
+ Privileged containers aren’t supported on Fargate.
+ Pods running on Fargate can’t specify `HostPort` or `HostNetwork` in the Pod manifest.
+ The default `nofile` and `nproc` soft limit is 1024 and the hard limit is 65535 for Fargate Pods.
+ GPUs aren’t currently available on Fargate.
+ Pods that run on Fargate are only supported on private subnets (with NAT gateway access to AWS services, but not a direct route to an Internet Gateway), so your cluster’s VPC must have private subnets available. For clusters without outbound internet access, see [Deploy private clusters with limited internet access](private-clusters.md).
+ You can use the [Adjust pod resources with Vertical Pod Autoscaler](vertical-pod-autoscaler.md) to set the initial correct size of CPU and memory for your Fargate Pods, and then use the [Scale pod deployments with Horizontal Pod Autoscaler](horizontal-pod-autoscaler.md) to scale those Pods. If you want the Vertical Pod Autoscaler to automatically re-deploy Pods to Fargate with larger CPU and memory combinations, set the mode for the Vertical Pod Autoscaler to either `Auto` or `Recreate` to ensure correct functionality. For more information, see the [Vertical Pod Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#quick-start) documentation on GitHub.
+ DNS resolution and DNS hostnames must be enabled for your VPC. For more information, see [Viewing and updating DNS support for your VPC](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-updating).
+ Amazon EKS Fargate adds defense-in-depth for Kubernetes applications by isolating each Pod within a Virtual Machine (VM). This VM boundary prevents access to host-based resources used by other Pods in the event of a container escape, which is a common method of attacking containerized applications and gain access to resources outside of the container.

  Using Amazon EKS doesn’t change your responsibilities under the [shared responsibility model](security.md). You should carefully consider the configuration of cluster security and governance controls. The safest way to isolate an application is always to run it in a separate cluster.
+ Fargate profiles support specifying subnets from VPC secondary CIDR blocks. You might want to specify a secondary CIDR block. This is because there’s a limited number of IP addresses available in a subnet. As a result, there’s also a limited number of Pods that can be created in the cluster. By using different subnets for Pods, you can increase the number of available IP addresses. For more information, see [Adding IPv4 CIDR blocks to a VPC.](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Subnets.html#vpc-resize) 
+ The Amazon EC2 instance metadata service (IMDS) isn’t available to Pods that are deployed to Fargate nodes. If you have Pods that are deployed to Fargate that need IAM credentials, assign them to your Pods using [IAM roles for service accounts](iam-roles-for-service-accounts.md). If your Pods need access to other information available through IMDS, then you must hard code this information into your Pod spec. This includes the AWS Region or Availability Zone that a Pod is deployed to.
+ You can’t deploy Fargate Pods to AWS Outposts, AWS Wavelength, or AWS Local Zones.
+ Amazon EKS must periodically patch Fargate Pods to keep them secure. We attempt the updates in a way that reduces impact, but there are times when Pods must be deleted if they aren’t successfully evicted. There are some actions you can take to minimize disruption. For more information, see [Set actions for AWS Fargate OS patching events](fargate-pod-patching.md).
+ The [Amazon VPC CNI plugin for Amazon EKS](https://github.com/aws/amazon-vpc-cni-plugins) is installed on Fargate nodes. You can’t use [Alternate CNI plugins for Amazon EKS clusters](alternate-cni-plugins.md) with Fargate nodes.
+ A Pod running on Fargate automatically mounts an Amazon EFS file system, without needing manual driver installation steps. You can’t use dynamic persistent volume provisioning with Fargate nodes, but you can use static provisioning.
+ Amazon EKS doesn’t support Fargate Spot.
+ You can’t mount Amazon EBS volumes to Fargate Pods.
+ You can run the Amazon EBS CSI controller on Fargate nodes, but the Amazon EBS CSI node DaemonSet can only run on Amazon EC2 instances.
+ After a [Kubernetes Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/) is marked `Completed` or `Failed`, the Pods that the Job creates normally continue to exist. This behavior allows you to view your logs and results, but with Fargate you will incur costs if you don’t clean up the Job afterwards.

  To automatically delete the related Pods after a Job completes or fails, you can specify a time period using the time-to-live (TTL) controller. The following example shows specifying `.spec.ttlSecondsAfterFinished` in your Job manifest.

  ```
  apiVersion: batch/v1
  kind: Job
  metadata:
    name: busybox
  spec:
    template:
      spec:
        containers:
        - name: busybox
          image: busybox
          command: ["/bin/sh", "-c", "sleep 10"]
        restartPolicy: Never
    ttlSecondsAfterFinished: 60 # <-- TTL controller
  ```

## Fargate Comparison Table
<a name="_fargate_comparison_table"></a>


| Criteria |  AWS Fargate | 
| --- | --- | 
|  Can be deployed to [AWS Outposts](https://docs.aws.amazon.com/outposts/latest/userguide/what-is-outposts.html)   |  No  | 
|  Can be deployed to an [AWS Local Zone](local-zones.md)   |  No  | 
|  Can run containers that require Windows  |  No  | 
|  Can run containers that require Linux  |  Yes  | 
|  Can run workloads that require the Inferentia chip  |  No  | 
|  Can run workloads that require a GPU  |  No  | 
|  Can run workloads that require Arm processors  |  No  | 
|  Can run AWS [Bottlerocket](https://aws.amazon.com/bottlerocket/)   |  No  | 
|  Pods share a kernel runtime environment with other Pods  |  No – Each Pod has a dedicated kernel  | 
|  Pods share CPU, memory, storage, and network resources with other Pods.  |  No – Each Pod has dedicated resources and can be sized independently to maximize resource utilization.  | 
|  Pods can use more hardware and memory than requested in Pod specs  |  No – The Pod can be re-deployed using a larger vCPU and memory configuration though.  | 
|  Must deploy and manage Amazon EC2 instances  |  No  | 
|  Must secure, maintain, and patch the operating system of Amazon EC2 instances  |  No  | 
|  Can provide bootstrap arguments at deployment of a node, such as extra [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) arguments.  |  No  | 
|  Can assign IP addresses to Pods from a different CIDR block than the IP address assigned to the node.  |  No  | 
|  Can SSH into node  |  No – There’s no node host operating system to SSH to.  | 
|  Can deploy your own custom AMI to nodes  |  No  | 
|  Can deploy your own custom CNI to nodes  |  No  | 
|  Must update node AMI on your own  |  No  | 
|  Must update node Kubernetes version on your own  |  No – You don’t manage nodes.  | 
|  Can use Amazon EBS storage with Pods  |  No  | 
|  Can use Amazon EFS storage with Pods  |   [Yes](efs-csi.md)   | 
|  Can use Amazon FSx for Lustre storage with Pods  |  No  | 
|  Can use Network Load Balancer for services  |  Yes, when using the [Create a network load balancer](network-load-balancing.md#network-load-balancer)   | 
|  Pods can run in a public subnet  |  No  | 
|  Can assign different VPC security groups to individual Pods  |  Yes  | 
|  Can run Kubernetes DaemonSets  |  No  | 
|  Support `HostPort` and `HostNetwork` in the Pod manifest  |  No  | 
|   AWS Region availability  |   [Some Amazon EKS supported regions](https://docs.aws.amazon.com/general/latest/gr/eks.html)   | 
|  Can run containers on Amazon EC2 dedicated hosts  |  No  | 
|  Pricing  |  Cost of an individual Fargate memory and CPU configuration. Each Pod has its own cost. For more information, see [AWS Fargate pricing](https://aws.amazon.com/fargate/pricing/).  | 

# Get started with AWS Fargate for your cluster
<a name="fargate-getting-started"></a>

This topic describes how to get started running Pods on AWS Fargate with your Amazon EKS cluster.

If you restrict access to the public endpoint of your cluster using CIDR blocks, we recommend that you also enable private endpoint access. This way, Fargate Pods can communicate with the cluster. Without the private endpoint enabled, the CIDR blocks that you specify for public access must include the outbound sources from your VPC. For more information, see [Cluster API server endpoint](cluster-endpoint.md).

**Prerequisite**  
An existing cluster. If you don’t already have an Amazon EKS cluster, see [Get started with Amazon EKS](getting-started.md).

## Step 1: Ensure that existing nodes can communicate with Fargate Pods
<a name="fargate-gs-check-compatibility"></a>

If you’re working with a new cluster with no nodes, or a cluster with only managed node groups (see [Simplify node lifecycle with managed node groups](managed-node-groups.md)), you can skip to [Step 2: Create a Fargate Pod execution role](#fargate-sg-pod-execution-role).

Assume that you’re working with an existing cluster that already has nodes that are associated with it. Make sure that Pods on these nodes can communicate freely with the Pods that are running on Fargate. Pods that are running on Fargate are automatically configured to use the cluster security group for the cluster that they’re associated with. Ensure that any existing nodes in your cluster can send and receive traffic to and from the cluster security group. Managed node groups are automatically configured to use the cluster security group as well, so you don’t need to modify or check them for this compatibility (see [Simplify node lifecycle with managed node groups](managed-node-groups.md)).

For existing node groups that were created with `eksctl` or the Amazon EKS managed AWS CloudFormation templates, you can add the cluster security group to the nodes manually. Or, alternatively, you can modify the Auto Scaling group launch template for the node group to attach the cluster security group to the instances. For more information, see [Changing an instance’s security groups](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html#SG_Changing_Group_Membership) in the *Amazon VPC User Guide*.

You can check for a security group for your cluster in the AWS Management Console under the **Networking** section for the cluster. Or, you can do this using the following AWS CLI command. When using this command, replace `<my-cluster>` with the name of your cluster.

```
aws eks describe-cluster --name <my-cluster> --query cluster.resourcesVpcConfig.clusterSecurityGroupId
```

## Step 2: Create a Fargate Pod execution role
<a name="fargate-sg-pod-execution-role"></a>

When your cluster creates Pods on AWS Fargate, the components that run on the Fargate infrastructure must make calls to AWS APIs on your behalf. The Amazon EKS Pod execution role provides the IAM permissions to do this. To create an AWS Fargate Pod execution role, see [Amazon EKS Pod execution IAM role](pod-execution-role.md).

**Note**  
If you created your cluster with `eksctl` using the `--fargate` option, your cluster already has a Pod execution role that you can find in the IAM console with the pattern `eksctl-my-cluster-FargatePodExecutionRole-ABCDEFGHIJKL`. Similarly, if you use `eksctl` to create your Fargate profiles, `eksctl` creates your Pod execution role if one isn’t already created.

## Step 3: Create a Fargate profile for your cluster
<a name="fargate-gs-create-profile"></a>

Before you can schedule Pods that are running on Fargate in your cluster, you must define a Fargate profile that specifies which Pods use Fargate when they’re launched. For more information, see [Define which Pods use AWS Fargate when launched](fargate-profile.md).

**Note**  
If you created your cluster with `eksctl` using the `--fargate` option, then a Fargate profile is already created for your cluster with selectors for all Pods in the `kube-system` and `default` namespaces. Use the following procedure to create Fargate profiles for any other namespaces you would like to use with Fargate.

You can create a Fargate profile using either of these tools:
+  [`eksctl`](#eksctl_fargate_profile_create) 
+  [AWS Management Console](#console_fargate_profile_create) 

### `eksctl`
<a name="eksctl_fargate_profile_create"></a>

This procedure requires `eksctl` version `0.215.0` or later. You can check your version with the following command:

```
eksctl version
```

For instructions on how to install or upgrade `eksctl`, see [Installation](https://eksctl.io/installation) in the `eksctl` documentation.

 **To create a Fargate profile with `eksctl` ** 

Create your Fargate profile with the following `eksctl` command, replacing every `<example value>` with your own values. You’re required to specify a namespace. However, the `--labels` option isn’t required.

```
eksctl create fargateprofile \
    --cluster <my-cluster> \
    --name <my-fargate-profile> \
    --namespace <my-kubernetes-namespace> \
    --labels <key=value>
```

You can use certain wildcards for `<my-kubernetes-namespace>` and `<key=value>` labels. For more information, see [Fargate profile wildcards](fargate-profile.md#fargate-profile-wildcards).

### AWS Management Console
<a name="console_fargate_profile_create"></a>

 **To create a Fargate profile with AWS Management Console ** 

1. Open the [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. Choose the cluster to create a Fargate profile for.

1. Choose the **Compute** tab.

1. Under **Fargate profiles**, choose **Add Fargate profile**.

1. On the **Configure Fargate profile** page, do the following:

   1. For **Name**, enter a name for your Fargate profile. The name must be unique.

   1. For **Pod execution role**, choose the Pod execution role to use with your Fargate profile. Only the IAM roles with the `eks-fargate-pods.amazonaws.com` service principal are shown. If you don’t see any roles listed, you must create one. For more information, see [Amazon EKS Pod execution IAM role](pod-execution-role.md).

   1. Modify the selected **Subnets** as needed.
**Note**  
Only private subnets are supported for Pods that are running on Fargate.

   1. For **Tags**, you can optionally tag your Fargate profile. These tags don’t propagate to other resources that are associated with the profile such as Pods.

   1. Choose **Next**.

1. On the **Configure Pod selection** page, do the following:

   1. For **Namespace**, enter a namespace to match for Pods.
      + You can use specific namespaces to match, such as `kube-system` or `default`.
      + You can use certain wildcards (for example, `prod-*`) to match multiple namespaces (for example, `prod-deployment` and `prod-test`). For more information, see [Fargate profile wildcards](fargate-profile.md#fargate-profile-wildcards).

   1. (Optional) Add Kubernetes labels to the selector. Specifically add them to the one that the Pods in the specified namespace need to match.
      + You can add the label `infrastructure: fargate` to the selector so that only Pods in the specified namespace that also have the `infrastructure: fargate` Kubernetes label match the selector.
      + You can use certain wildcards (for example, `key?: value?`) to match multiple namespaces (for example, `keya: valuea` and `keyb: valueb`). For more information, see [Fargate profile wildcards](fargate-profile.md#fargate-profile-wildcards).

   1. Choose **Next**.

1. On the **Review and create** page, review the information for your Fargate profile and choose **Create**.

## Step 4: Update CoreDNS
<a name="fargate-gs-coredns"></a>

By default, CoreDNS is configured to run on Amazon EC2 infrastructure on Amazon EKS clusters. If you want to *only* run your Pods on Fargate in your cluster, complete the following steps.

**Note**  
If you created your cluster with `eksctl` using the `--fargate` option, then you can skip to [Next steps](#fargate-gs-next-steps).

1. Create a Fargate profile for CoreDNS with the following command. Replace `<my-cluster>` with your cluster name, `<111122223333>` with your account ID, `<AmazonEKSFargatePodExecutionRole>` with the name of your Pod execution role, and `<000000000000000a>`, `<000000000000000b>`, and `<000000000000000c>` with the IDs of your private subnets. If you don’t have a Pod execution role, you must create one first (see [Step 2: Create a Fargate Pod execution role](#fargate-sg-pod-execution-role)).
**Important**  
The role ARN can’t include a [path](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html#identifiers-friendly-names) other than `/`. For example, if the name of your role is `development/apps/AmazonEKSFargatePodExecutionRole`, you need to change it to `AmazonEKSFargatePodExecutionRole` when specifying the ARN for the role. The format of the role ARN must be ` arn:aws:iam::<111122223333>:role/<AmazonEKSFargatePodExecutionRole>`.

   ```
   aws eks create-fargate-profile \
       --fargate-profile-name coredns \
       --cluster-name <my-cluster> \
       --pod-execution-role-arn arn:aws:iam::<111122223333>:role/<AmazonEKSFargatePodExecutionRole> \
       --selectors namespace=kube-system,labels={k8s-app=kube-dns} \
       --subnets subnet-<000000000000000a> subnet-<000000000000000b> subnet-<000000000000000c>
   ```

1. Trigger a rollout of the `coredns` deployment.

   ```
   kubectl rollout restart -n kube-system deployment coredns
   ```

## Next steps
<a name="fargate-gs-next-steps"></a>
+ You can start migrating your existing applications to run on Fargate with the following workflow.

  1.  [Create a Fargate profile](fargate-profile.md#create-fargate-profile) that matches your application’s Kubernetes namespace and Kubernetes labels.

  1. Delete and re-create any existing Pods so that they’re scheduled on Fargate. Modify the `<namespace>` and `<deployment-type>` to update your specific Pods.

     ```
     kubectl rollout restart -n <namespace> deployment <deployment-type>
     ```
+ Deploy the [Route application and HTTP traffic with Application Load Balancers](alb-ingress.md) to allow Ingress objects for your Pods running on Fargate.
+ You can use the [Adjust pod resources with Vertical Pod Autoscaler](vertical-pod-autoscaler.md) to set the initial correct size of CPU and memory for your Fargate Pods, and then use the [Scale pod deployments with Horizontal Pod Autoscaler](horizontal-pod-autoscaler.md) to scale those Pods. If you want the Vertical Pod Autoscaler to automatically re-deploy Pods to Fargate with higher CPU and memory combinations, set the Vertical Pod Autoscaler’s mode to either `Auto` or `Recreate`. This is to ensure correct functionality. For more information, see the [Vertical Pod Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#quick-start) documentation on GitHub.
+ You can set up the [AWS Distro for OpenTelemetry](https://aws.amazon.com/otel) (ADOT) collector for application monitoring by following [these instructions](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-EKS-otel.html).

# Define which Pods use AWS Fargate when launched
<a name="fargate-profile"></a>

Before you schedule Pods on Fargate in your cluster, you must define at least one Fargate profile that specifies which Pods use Fargate when launched.

As an administrator, you can use a Fargate profile to declare which Pods run on Fargate. You can do this through the profile’s selectors. You can add up to five selectors to each profile. Each selector must contain a namespace. The selector can also include labels. The label field consists of multiple optional key-value pairs. Pods that match a selector are scheduled on Fargate. Pods are matched using a namespace and the labels that are specified in the selector. If a namespace selector is defined without labels, Amazon EKS attempts to schedule all the Pods that run in that namespace onto Fargate using the profile. If a to-be-scheduled Pod matches any of the selectors in the Fargate profile, then that Pod is scheduled on Fargate.

If a Pod matches multiple Fargate profiles, you can specify which profile a Pod uses by adding the following Kubernetes label to the Pod specification: `eks.amazonaws.com/fargate-profile: my-fargate-profile`. The Pod must match a selector in that profile to be scheduled onto Fargate. Kubernetes affinity/anti-affinity rules do not apply and aren’t necessary with Amazon EKS Fargate Pods.

When you create a Fargate profile, you must specify a Pod execution role. This execution role is for the Amazon EKS components that run on the Fargate infrastructure using the profile. It’s added to the cluster’s Kubernetes [Role Based Access Control](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) (RBAC) for authorization. That way, the `kubelet` that runs on the Fargate infrastructure can register with your Amazon EKS cluster and appear in your cluster as a node. The Pod execution role also provides IAM permissions to the Fargate infrastructure to allow read access to Amazon ECR image repositories. For more information, see [Amazon EKS Pod execution IAM role](pod-execution-role.md).

Fargate profiles can’t be changed. However, you can create a new updated profile to replace an existing profile, and then delete the original.

**Note**  
Any Pods that are running using a Fargate profile are stopped and put into a pending state when the profile is deleted.

If any Fargate profiles in a cluster are in the `DELETING` status, you must wait until after the Fargate profile is deleted before you create other profiles in that cluster.

**Note**  
Fargate does not currently support Kubernetes [topologySpreadConstraints](https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/).

Amazon EKS and Fargate spread Pods across each of the subnets that’s defined in the Fargate profile. However, you might end up with an uneven spread. If you must have an even spread, use two Fargate profiles. Even spread is important in scenarios where you want to deploy two replicas and don’t want any downtime. We recommend that each profile has only one subnet.

## Fargate profile components
<a name="fargate-profile-components"></a>

The following components are contained in a Fargate profile.

 **Pod execution role**   
When your cluster creates Pods on AWS Fargate, the `kubelet` that’s running on the Fargate infrastructure must make calls to AWS APIs on your behalf. For example, it needs to make calls to pull container images from Amazon ECR. The Amazon EKS Pod execution role provides the IAM permissions to do this.  
When you create a Fargate profile, you must specify a Pod execution role to use with your Pods. This role is added to the cluster’s Kubernetes [Role-based access control](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) (RBAC) for authorization. This is so that the `kubelet` that’s running on the Fargate infrastructure can register with your Amazon EKS cluster and appear in your cluster as a node. For more information, see [Amazon EKS Pod execution IAM role](pod-execution-role.md).

 **Subnets**   
The IDs of subnets to launch Pods into that use this profile. At this time, Pods that are running on Fargate aren’t assigned public IP addresses. Therefore, only private subnets with no direct route to an Internet Gateway are accepted for this parameter.

 **Selectors**   
The selectors to match for Pods to use this Fargate profile. You might specify up to five selectors in a Fargate profile. The selectors have the following components:  
+  **Namespace** – You must specify a namespace for a selector. The selector only matches Pods that are created in this namespace. However, you can create multiple selectors to target multiple namespaces.
+  **Labels** – You can optionally specify Kubernetes labels to match for the selector. The selector only matches Pods that have all of the labels that are specified in the selector.

## Fargate profile wildcards
<a name="fargate-profile-wildcards"></a>

In addition to characters allowed by Kubernetes, you’re allowed to use `*` and `?` in the selector criteria for namespaces, label keys, and label values:
+  `*` represents none, one, or multiple characters. For example, `prod*` can represent `prod` and `prod-metrics`.
+  `?` represents a single character (for example, `value?` can represent `valuea`). However, it can’t represent `value` and `value-a`, because `?` can only represent exactly one character.

These wildcard characters can be used in any position and in combination (for example, `prod*`, `*dev`, and `frontend*?`). Other wildcards and forms of pattern matching, such as regular expressions, aren’t supported.

If there are multiple matching profiles for the namespace and labels in the Pod spec, Fargate picks up the profile based on alphanumeric sorting by profile name. For example, if both profile A (with the name `beta-workload`) and profile B (with the name `prod-workload`) have matching selectors for the Pods to be launched, Fargate picks profile A (`beta-workload`) for the Pods. The Pods have labels with profile A on the Pods (for example, `eks.amazonaws.com/fargate-profile=beta-workload`).

If you want to migrate existing Fargate Pods to new profiles that use wildcards, there are two ways to do so:
+ Create a new profile with matching selectors, then delete the old profiles. Pods labeled with old profiles are rescheduled to new matching profiles.
+ If you want to migrate workloads but aren’t sure what Fargate labels are on each Fargate Pod, you can use the following method. Create a new profile with a name that sorts alphanumerically first among the profiles on the same cluster. Then, recycle the Fargate Pods that need to be migrated to new profiles.

## Create a Fargate profile
<a name="create-fargate-profile"></a>

This section describes how to create a Fargate profile. You also must have created a Pod execution role to use for your Fargate profile. For more information, see [Amazon EKS Pod execution IAM role](pod-execution-role.md). Pods that are running on Fargate are only supported on private subnets with [NAT gateway](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html) access to AWS services, but not a direct route to an Internet Gateway. This is so that your cluster’s VPC must have private subnets available.

You can create a profile with the following:
+  [`eksctl`](#eksctl_create_a_fargate_profile) 
+  [AWS Management Console](#console_create_a_fargate_profile) 

## `eksctl`
<a name="eksctl_create_a_fargate_profile"></a>

 **To create a Fargate profile with `eksctl` ** 

Create your Fargate profile with the following `eksctl` command, replacing every example value with your own values. You’re required to specify a namespace. However, the `--labels` option isn’t required.

```
eksctl create fargateprofile \
    --cluster my-cluster \
    --name my-fargate-profile \
    --namespace my-kubernetes-namespace \
    --labels key=value
```

You can use certain wildcards for `my-kubernetes-namespace` and `key=value` labels. For more information, see [Fargate profile wildcards](#fargate-profile-wildcards).

## AWS Management Console
<a name="console_create_a_fargate_profile"></a>

 **To create a Fargate profile with AWS Management Console ** 

1. Open the [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. Choose the cluster to create a Fargate profile for.

1. Choose the **Compute** tab.

1. Under **Fargate profiles**, choose **Add Fargate profile**.

1. On the **Configure Fargate profile** page, do the following:

   1. For **Name**, enter a unique name for your Fargate profile, such as `my-profile`.

   1. For **Pod execution role**, choose the Pod execution role to use with your Fargate profile. Only the IAM roles with the `eks-fargate-pods.amazonaws.com` service principal are shown. If you don’t see any roles listed, you must create one. For more information, see [Amazon EKS Pod execution IAM role](pod-execution-role.md).

   1. Modify the selected **Subnets** as needed.
**Note**  
Only private subnets are supported for Pods that are running on Fargate.

   1. For **Tags**, you can optionally tag your Fargate profile. These tags don’t propagate to other resources that are associated with the profile, such as Pods.

   1. Choose **Next**.

1. On the **Configure Pod selection** page, do the following:

   1. For **Namespace**, enter a namespace to match for Pods.
      + You can use specific namespaces to match, such as `kube-system` or `default`.
      + You can use certain wildcards (for example, `prod-*`) to match multiple namespaces (for example, `prod-deployment` and `prod-test`). For more information, see [Fargate profile wildcards](#fargate-profile-wildcards).

   1. (Optional) Add Kubernetes labels to the selector. Specifically, add them to the one that the Pods in the specified namespace need to match.
      + You can add the label `infrastructure: fargate` to the selector so that only Pods in the specified namespace that also have the `infrastructure: fargate` Kubernetes label match the selector.
      + You can use certain wildcards (for example, `key?: value?`) to match multiple namespaces (for example, `keya: valuea` and `keyb: valueb`). For more information, see [Fargate profile wildcards](#fargate-profile-wildcards).

   1. Choose **Next**.

1. On the **Review and create** page, review the information for your Fargate profile and choose **Create**.

# Delete a Fargate profile
<a name="delete-fargate-profile"></a>

This topic describes how to delete a Fargate profile. When you delete a Fargate profile, any Pods that were scheduled onto Fargate with the profile are deleted. If those Pods match another Fargate profile, then they’re scheduled on Fargate with that profile. If they no longer match any Fargate profiles, then they aren’t scheduled onto Fargate and might remain as pending.

Only one Fargate profile in a cluster can be in the `DELETING` status at a time. Wait for a Fargate profile to finish deleting before you can delete any other profiles in that cluster.

You can delete a profile with any of the following tools:
+  [`eksctl`](#eksctl_delete_a_fargate_profile) 
+  [AWS Management Console](#console_delete_a_fargate_profile) 
+  [AWS CLI](#awscli_delete_a_fargate_profile) 

## `eksctl`
<a name="eksctl_delete_a_fargate_profile"></a>

 **Delete a Fargate profile with `eksctl` ** 

Use the following command to delete a profile from a cluster. Replace every *example value* with your own values.

```
eksctl delete fargateprofile  --name my-profile --cluster my-cluster
```

## AWS Management Console
<a name="console_delete_a_fargate_profile"></a>

 **Delete a Fargate profile with AWS Management Console ** 

1. Open the [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. In the left navigation pane, choose **Clusters**. In the list of clusters, choose the cluster that you want to delete the Fargate profile from.

1. Choose the **Compute** tab.

1. Choose the Fargate profile to delete, and then choose **Delete**.

1. On the **Delete Fargate profile** page, enter the name of the profile, and then choose **Delete**.

## AWS CLI
<a name="awscli_delete_a_fargate_profile"></a>

 **Delete a Fargate profile with AWS CLI** 

Use the following command to delete a profile from a cluster. Replace every *example value* with your own values.

```
aws eks delete-fargate-profile --fargate-profile-name my-profile --cluster-name my-cluster
```

# Understand Fargate Pod configuration details
<a name="fargate-pod-configuration"></a>

This section describes some of the unique Pod configuration details for running Kubernetes Pods on AWS Fargate.

## Pod CPU and memory
<a name="fargate-cpu-and-memory"></a>

With Kubernetes, you can define requests, a minimum vCPU amount, and memory resources that are allocated to each container in a Pod. Pods are scheduled by Kubernetes to ensure that at least the requested resources for each Pod are available on the compute resource. For more information, see [Managing compute resources for containers](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/) in the Kubernetes documentation.

**Note**  
Since Amazon EKS Fargate runs only one Pod per node, the scenario of evicting Pods in case of fewer resources doesn’t occur. All Amazon EKS Fargate Pods run with guaranteed priority, so the requested CPU and memory must be equal to the limit for all of the containers. For more information, see [Configure Quality of Service for Pods](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/) in the Kubernetes documentation.

When Pods are scheduled on Fargate, the vCPU and memory reservations within the Pod specification determine how much CPU and memory to provision for the Pod.
+ The maximum request out of any Init containers is used to determine the Init request vCPU and memory requirements.
+ Requests for all long-running containers are added up to determine the long-running request vCPU and memory requirements.
+ The larger of the previous two values is chosen for the vCPU and memory request to use for your Pod.
+ Fargate adds 256 MB to each Pod’s memory reservation for the required Kubernetes components (`kubelet`, `kube-proxy`, and `containerd`).

Fargate rounds up to the following compute configuration that most closely matches the sum of vCPU and memory requests in order to ensure Pods always have the resources that they need to run.

If you don’t specify a vCPU and memory combination, then the smallest available combination is used (.25 vCPU and 0.5 GB memory).

The following table shows the vCPU and memory combinations that are available for Pods running on Fargate.


| vCPU value | Memory value | 
| --- | --- | 
|  .25 vCPU  |  0.5 GB, 1 GB, 2 GB  | 
|  .5 vCPU  |  1 GB, 2 GB, 3 GB, 4 GB  | 
|  1 vCPU  |  2 GB, 3 GB, 4 GB, 5 GB, 6 GB, 7 GB, 8 GB  | 
|  2 vCPU  |  Between 4 GB and 16 GB in 1-GB increments  | 
|  4 vCPU  |  Between 8 GB and 30 GB in 1-GB increments  | 
|  8 vCPU  |  Between 16 GB and 60 GB in 4-GB increments  | 
|  16 vCPU  |  Between 32 GB and 120 GB in 8-GB increments  | 

The additional memory reserved for the Kubernetes components can cause a Fargate task with more vCPUs than requested to be provisioned. For example, a request for 1 vCPU and 8 GB memory will have 256 MB added to its memory request, and will provision a Fargate task with 2 vCPUs and 9 GB memory, since no task with 1 vCPU and 9 GB memory is available.

There is no correlation between the size of the Pod running on Fargate and the node size reported by Kubernetes with `kubectl get nodes`. The reported node size is often larger than the Pod’s capacity. You can verify Pod capacity with the following command. Replace *default* with your Pod’s namespace and *pod-name* with the name of your Pod.

```
kubectl describe pod --namespace default pod-name
```

An example output is as follows.

```
[...]
annotations:
    CapacityProvisioned: 0.25vCPU 0.5GB
[...]
```

The `CapacityProvisioned` annotation represents the enforced Pod capacity and it determines the cost of your Pod running on Fargate. For pricing information for the compute configurations, see [AWS Fargate Pricing](https://aws.amazon.com/fargate/pricing/).

## Fargate storage
<a name="fargate-storage"></a>

A Pod running on Fargate automatically mounts an Amazon EFS file system, without needing manual driver installation steps. You can’t use dynamic persistent volume provisioning with Fargate nodes, but you can use static provisioning. For more information, see [Amazon EFS CSI Driver](https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/docs/README.md) on GitHub.

When provisioned, each Pod running on Fargate receives a default 20 GiB of ephemeral storage. This type of storage is deleted after a Pod stops. New Pods launched onto Fargate have encryption of the ephemeral storage volume enabled by default. The ephemeral Pod storage is encrypted with an AES-256 encryption algorithm using AWS Fargate managed keys.

**Note**  
The default usable storage for Amazon EKS Pods that run on Fargate is less than 20 GiB. This is because some space is used by the `kubelet` and other Kubernetes modules that are loaded inside the Pod.

You can increase the total amount of ephemeral storage up to a maximum of 175 GiB. To configure the size with Kubernetes, specify the requests of `ephemeral-storage` resource to each container in a Pod. When Kubernetes schedules Pods, it ensures that the sum of the resource requests for each Pod is less than the capacity of the Fargate task. For more information, see [Resource Management for Pods and Containers](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/) in the Kubernetes documentation.

Amazon EKS Fargate provisions more ephemeral storage than requested for the purposes of system use. For example, a request of 100 GiB will provision a Fargate task with 115 GiB ephemeral storage.

# Set actions for AWS Fargate OS patching events
<a name="fargate-pod-patching"></a>

Amazon EKS periodically patches the OS for AWS Fargate nodes to keep them secure. As part of the patching process, we recycle the nodes to install OS patches. Updates are attempted in a way that creates the least impact on your services. However, if Pods aren’t successfully evicted, there are times when they must be deleted. The following are actions that you can take to minimize potential disruptions:
+ Set appropriate Pod disruption budgets (PDBs) to control the number of Pods that are down simultaneously.
+ Create Amazon EventBridge rules to handle failed evictions before the Pods are deleted.
+ Manually restart your affected pods before the eviction date posted in the notification you receive.
+ Create a notification configuration in AWS User Notifications.

Amazon EKS works closely with the Kubernetes community to make bug fixes and security patches available as quickly as possible. All Fargate Pods start on the most recent Kubernetes patch version, which is available from Amazon EKS for the Kubernetes version of your cluster. If you have a Pod with an older patch version, Amazon EKS might recycle it to update it to the latest version. This ensures that your Pods are equipped with the latest security updates. That way, if there’s a critical [Common Vulnerabilities and Exposures](https://cve.mitre.org/) (CVE) issue, you’re kept up to date to reduce security risks.

When the AWS Fargate OS is updated, Amazon EKS will send you a notification that includes your affected resources and the date of upcoming pod evictions. If the provided eviction date is inconvenient, you have the option to manually restart your affected pods before the eviction date posted in the notification. Any pods created before the time at which you receive the notification are subject to eviction. Refer to the [Kubernetes Documentation](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_rollout/kubectl_rollout_restart) for further instructions on how to manually restart your pods.

To limit the number of Pods that are down at one time when Pods are recycled, you can set Pod disruption budgets (PDBs). You can use PDBs to define minimum availability based on the requirements of each of your applications while still allowing updates to occur. Your PDB’s minimum availability must be less than 100%. For more information, see [Specifying a Disruption Budget for your Application](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) in the Kubernetes Documentation.

Amazon EKS uses the [Eviction API](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/#eviction-api) to safely drain the Pod while respecting the PDBs that you set for the application. Pods are evicted by Availability Zone to minimize impact. If the eviction succeeds, the new Pod gets the latest patch and no further action is required.

When the eviction for a Pod fails, Amazon EKS sends an event to your account with details about the Pods that failed eviction. You can act on the message before the scheduled termination time. The specific time varies based on the urgency of the patch. When it’s time, Amazon EKS attempts to evict the Pods again. However, this time a new event isn’t sent if the eviction fails. If the eviction fails again, your existing Pods are deleted periodically so that the new Pods can have the latest patch.

The following is a sample event received when the Pod eviction fails. It contains details about the cluster, Pod name, Pod namespace, Fargate profile, and the scheduled termination time.

```
{
    "version": "0",
    "id": "12345678-90ab-cdef-0123-4567890abcde",
    "detail-type": "EKS Fargate Pod Scheduled Termination",
    "source": "aws.eks",
    "account": "111122223333",
    "time": "2021-06-27T12:52:44Z",
    "region": "region-code",
    "resources": [
        "default/my-database-deployment"
    ],
    "detail": {
        "clusterName": "my-cluster",
        "fargateProfileName": "my-fargate-profile",
        "podName": "my-pod-name",
        "podNamespace": "default",
        "evictErrorMessage": "Cannot evict pod as it would violate the pod's disruption budget",
        "scheduledTerminationTime": "2021-06-30T12:52:44.832Z[UTC]"
    }
}
```

In addition, having multiple PDBs associated with a Pod can cause an eviction failure event. This event returns the following error message.

```
"evictErrorMessage": "This pod has multiple PodDisruptionBudget, which the eviction subresource does not support",
```

You can create a desired action based on this event. For example, you can adjust your Pod disruption budget (PDB) to control how the Pods are evicted. More specifically, suppose that you start with a PDB that specifies the target percentage of Pods that are available. Before your Pods are force terminated during an upgrade, you can adjust the PDB to a different percentage of Pods. To receive this event, you must create an Amazon EventBridge rule in the AWS account and AWS Region that the cluster belongs to. The rule must use the following **Custom pattern**. For more information, see [Creating Amazon EventBridge rules that react to events](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-create-rule.html) in the *Amazon EventBridge User Guide*.

```
{
  "source": ["aws.eks"],
  "detail-type": ["EKS Fargate Pod Scheduled Termination"]
}
```

A suitable target can be set for the event to capture it. For a complete list of available targets, see [Amazon EventBridge targets](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-targets.html) in the *Amazon EventBridge User Guide*. You can also create a notification configuration in AWS User Notifications. When using the AWS Management Console to create the notification, under **Event Rules**, choose **Elastic Kubernetes Service (EKS)** for ** AWS service name** and **EKS Fargate Pod Scheduled Termination** for **Event type**. For more information, see [Getting started with AWS User Notifications](https://docs.aws.amazon.com/notifications/latest/userguide/getting-started.html) in the AWS User Notifications User Guide.

See [FAQs: Fargate Pod eviction notice](https://repost.aws/knowledge-center/fargate-pod-eviction-notice) in * AWS re:Post* for frequently asked questions regarding EKS Pod Evictions.

# Collect AWS Fargate app and usage metrics
<a name="monitoring-fargate-usage"></a>

You can collect system metrics and CloudWatch usage metrics for AWS Fargate.

## Application metrics
<a name="fargate-application-metrics"></a>

For applications running on Amazon EKS and AWS Fargate, you can use the AWS Distro for OpenTelemetry (ADOT). ADOT allows you to collect system metrics and send them to CloudWatch Container Insights dashboards. To get started with ADOT for applications running on Fargate, see [Using CloudWatch Container Insights with AWS Distro for OpenTelemetry](https://aws-otel.github.io/docs/getting-started/container-insights) in the ADOT documentation.

## Usage metrics
<a name="fargate-usage-metrics"></a>

You can use CloudWatch usage metrics to provide visibility into your account’s usage of resources. Use these metrics to visualize your current service usage on CloudWatch graphs and dashboards.

 AWS Fargate usage metrics correspond to AWS service quotas. You can configure alarms that alert you when your usage approaches a service quota. For more information about Fargate service quotas, see [View and manage Amazon EKS and Fargate service quotas](service-quotas.md).

 AWS Fargate publishes the following metrics in the ` AWS/Usage` namespace.


| Metric | Description | 
| --- | --- | 
|   `ResourceCount`   |  The total number of the specified resource running on your account. The resource is defined by the dimensions associated with the metric.  | 

The following dimensions are used to refine the usage metrics that are published by AWS Fargate.


| Dimension | Description | 
| --- | --- | 
|   `Service`   |  The name of the AWS service containing the resource. For AWS Fargate usage metrics, the value for this dimension is `Fargate`.  | 
|   `Type`   |  The type of entity that’s being reported. Currently, the only valid value for AWS Fargate usage metrics is `Resource`.  | 
|   `Resource`   |  The type of resource that’s running. Currently, AWS Fargate returns information on your Fargate On-Demand usage. The resource value for Fargate On-Demand usage is `OnDemand`. [NOTE] ==== Fargate On-Demand usage combines Amazon EKS Pods using Fargate, Amazon ECS tasks using the Fargate launch type and Amazon ECS tasks using the `FARGATE` capacity provider. ====  | 
|   `Class`   |  The class of resource being tracked. Currently, AWS Fargate doesn’t use the class dimension.  | 

### Creating a CloudWatch alarm to monitor Fargate resource usage metrics
<a name="service-quota-alarm"></a>

 AWS Fargate provides CloudWatch usage metrics that correspond to the AWS service quotas for Fargate On-Demand resource usage. In the Service Quotas console, you can visualize your usage on a graph. You can also configure alarms that alert you when your usage approaches a service quota. For more information, see [Collect AWS Fargate app and usage metrics](#monitoring-fargate-usage).

Use the following steps to create a CloudWatch alarm based on the Fargate resource usage metrics.

1. Open the Service Quotas console at https://console.aws.amazon.com/servicequotas/.

1. In the left navigation pane, choose ** AWS services**.

1. From the ** AWS services** list, search for and select ** AWS Fargate**.

1. In the **Service quotas** list, choose the Fargate usage quota you want to create an alarm for.

1. In the Amazon CloudWatch alarms section, choose **Create**.

1. For **Alarm threshold**, choose the percentage of your applied quota value that you want to set as the alarm value.

1. For **Alarm name**, enter a name for the alarm and then choose **Create**.

# Start AWS Fargate logging for your cluster
<a name="fargate-logging"></a>

Amazon EKS on Fargate offers a built-in log router based on Fluent Bit. This means that you don’t explicitly run a Fluent Bit container as a sidecar, but Amazon runs it for you. All that you have to do is configure the log router. The configuration happens through a dedicated `ConfigMap` that must meet the following criteria:
+ Named `aws-logging` 
+ Created in a dedicated namespace called `aws-observability` 
+ Can’t exceed 5300 characters.

Once you’ve created the `ConfigMap`, Amazon EKS on Fargate automatically detects it and configures the log router with it. Fargate uses a version of AWS for Fluent Bit, an upstream compliant distribution of Fluent Bit managed by AWS. For more information, see [AWS for Fluent Bit](https://github.com/aws/aws-for-fluent-bit) on GitHub.

The log router allows you to use the breadth of services at AWS for log analytics and storage. You can stream logs from Fargate directly to Amazon CloudWatch, Amazon OpenSearch Service. You can also stream logs to destinations such as [Amazon S3](https://aws.amazon.com/s3/), [Amazon Kinesis Data Streams](https://aws.amazon.com/kinesis/data-streams/), and partner tools through [Amazon Data Firehose](https://aws.amazon.com/kinesis/data-firehose/).
+ An existing Fargate profile that specifies an existing Kubernetes namespace that you deploy Fargate Pods to. For more information, see [Step 3: Create a Fargate profile for your cluster](fargate-getting-started.md#fargate-gs-create-profile).
+ An existing Fargate Pod execution role. For more information, see [Step 2: Create a Fargate Pod execution role](fargate-getting-started.md#fargate-sg-pod-execution-role).

## Log router configuration
<a name="fargate-logging-log-router-configuration"></a>

**Important**  
For logs to be successfully published, there must be network access from the VPC that your cluster is in to the log destination. This mainly concerns users customizing egress rules for their VPC. For an example using CloudWatch, see [Using CloudWatch Logs with interface VPC endpoints](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/cloudwatch-logs-and-interface-VPC.html) in the *Amazon CloudWatch Logs User Guide*.

In the following steps, replace every *example value* with your own values.

1. Create a dedicated Kubernetes namespace named `aws-observability`.

   1. Save the following contents to a file named `aws-observability-namespace.yaml` on your computer. The value for `name` must be `aws-observability` and the `aws-observability: enabled` label is required.

      ```
      kind: Namespace
      apiVersion: v1
      metadata:
        name: aws-observability
        labels:
          aws-observability: enabled
      ```

   1. Create the namespace.

      ```
      kubectl apply -f aws-observability-namespace.yaml
      ```

1. Create a `ConfigMap` with a `Fluent Conf` data value to ship container logs to a destination. Fluent Conf is Fluent Bit, which is a fast and lightweight log processor configuration language that’s used to route container logs to a log destination of your choice. For more information, see [Configuration File](https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/classic-mode/configuration-file) in the Fluent Bit documentation.
**Important**  
The main sections included in a typical `Fluent Conf` are `Service`, `Input`, `Filter`, and `Output`. The Fargate log router however, only accepts:  
The `Filter` and `Output` sections.
A `Parser` section.
If you provide any other sections, they will be rejected.

   The Fargate log router manages the `Service` and `Input` sections. It has the following `Input` section, which can’t be modified and isn’t needed in your `ConfigMap`. However, you can get insights from it, such as the memory buffer limit and the tag applied for logs.

   ```
   [INPUT]
       Name tail
       Buffer_Max_Size 66KB
       DB /var/log/flb_kube.db
       Mem_Buf_Limit 45MB
       Path /var/log/containers/*.log
       Read_From_Head On
       Refresh_Interval 10
       Rotate_Wait 30
       Skip_Long_Lines On
       Tag kube.*
   ```

   When creating the `ConfigMap`, take into account the following rules that Fargate uses to validate fields:
   +  `[FILTER]`, `[OUTPUT]`, and `[PARSER]` are supposed to be specified under each corresponding key. For example, `[FILTER]` must be under `filters.conf`. You can have one or more `[FILTER]`s under `filters.conf`. The `[OUTPUT]` and `[PARSER]` sections should also be under their corresponding keys. By specifying multiple `[OUTPUT]` sections, you can route your logs to different destinations at the same time.
   + Fargate validates the required keys for each section. `Name` and `match` are required for each `[FILTER]` and `[OUTPUT]`. `Name` and `format` are required for each `[PARSER]`. The keys are case-insensitive.
   + Environment variables such as `${ENV_VAR}` aren’t allowed in the `ConfigMap`.
   + The indentation has to be the same for either directive or key-value pair within each `filters.conf`, `output.conf`, and `parsers.conf`. Key-value pairs have to be indented more than directives.
   + Fargate validates against the following supported filters: `grep`, `parser`, `record_modifier`, `rewrite_tag`, `throttle`, `nest`, `modify`, and `kubernetes`.
   + Fargate validates against the following supported output: `es`, `firehose`, `kinesis_firehose`, `cloudwatch`, `cloudwatch_logs`, and `kinesis`.
   + At least one supported `Output` plugin has to be provided in the `ConfigMap` to enable logging. `Filter` and `Parser` aren’t required to enable logging.

     You can also run Fluent Bit on Amazon EC2 using the desired configuration to troubleshoot any issues that arise from validation. Create your `ConfigMap` using one of the following examples.
**Important**  
Amazon EKS Fargate logging doesn’t support dynamic configuration of a `ConfigMap`. Any changes to a `ConfigMap` are applied to new Pods only. Changes aren’t applied to existing Pods.

     Create a `ConfigMap` using the example for your desired log destination.
**Note**  
You can also use Amazon Kinesis Data Streams for your log destination. If you use Kinesis Data Streams, make sure that the pod execution role has been granted the `kinesis:PutRecords` permission. For more information, see Amazon Kinesis Data Streams [Permissions](https://docs.fluentbit.io/manual/pipeline/outputs/kinesis#permissions) in the *Fluent Bit: Official Manual*.  
**Example**  

------
#### [ CloudWatch ]

   You have two output options when using CloudWatch:
   +  [An output plugin written in C](https://docs.fluentbit.io/manual/v/1.5/pipeline/outputs/cloudwatch) 
   +  [An output plugin written in Golang](https://github.com/aws/amazon-cloudwatch-logs-for-fluent-bit) 

   The following example shows you how to use the `cloudwatch_logs` plugin to send logs to CloudWatch.

   1. Save the following contents to a file named `aws-logging-cloudwatch-configmap.yaml`. Replace *region-code* with the AWS Region that your cluster is in. The parameters under `[OUTPUT]` are required.

      ```
      kind: ConfigMap
      apiVersion: v1
      metadata:
        name: aws-logging
        namespace: aws-observability
      data:
        flb_log_cw: "false"  # Set to true to ship Fluent Bit process logs to CloudWatch.
        filters.conf: |
          [FILTER]
              Name parser
              Match *
              Key_name log
              Parser crio
          [FILTER]
              Name kubernetes
              Match kube.*
              Merge_Log On
              Keep_Log Off
              Buffer_Size 0
              Kube_Meta_Cache_TTL 300s
        output.conf: |
          [OUTPUT]
              Name cloudwatch_logs
              Match   kube.*
              region region-code
              log_group_name my-logs
              log_stream_prefix from-fluent-bit-
              log_retention_days 60
              auto_create_group true
        parsers.conf: |
          [PARSER]
              Name crio
              Format Regex
              Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>P|F) (?<log>.*)$
              Time_Key    time
              Time_Format %Y-%m-%dT%H:%M:%S.%L%z
      ```

   1. Apply the manifest to your cluster.

      ```
      kubectl apply -f aws-logging-cloudwatch-configmap.yaml
      ```

------
#### [ Amazon OpenSearch Service ]

   If you want to send logs to Amazon OpenSearch Service, you can use [es](https://docs.fluentbit.io/manual/v/1.5/pipeline/outputs/elasticsearch) output, which is a plugin written in C. The following example shows you how to use the plugin to send logs to OpenSearch.

   1. Save the following contents to a file named `aws-logging-opensearch-configmap.yaml`. Replace every *example value* with your own values.

      ```
      kind: ConfigMap
      apiVersion: v1
      metadata:
        name: aws-logging
        namespace: aws-observability
      data:
        output.conf: |
          [OUTPUT]
            Name  es
            Match *
            Host  search-example-gjxdcilagiprbglqn42jsty66y.region-code.es.amazonaws.com
            Port  443
            Index example
            Type  example_type
            AWS_Auth On
            AWS_Region region-code
            tls   On
      ```

   1. Apply the manifest to your cluster.

      ```
      kubectl apply -f aws-logging-opensearch-configmap.yaml
      ```

------
#### [ Firehose ]

   You have two output options when sending logs to Firehose:
   +  [kinesis\$1firehose](https://docs.fluentbit.io/manual/pipeline/outputs/firehose) – An output plugin written in C.
   +  [firehose](https://github.com/aws/amazon-kinesis-firehose-for-fluent-bit) – An output plugin written in Golang.

     The following example shows you how to use the `kinesis_firehose` plugin to send logs to Firehose.

     1. Save the following contents to a file named `aws-logging-firehose-configmap.yaml`. Replace *region-code* with the AWS Region that your cluster is in.

        ```
        kind: ConfigMap
        apiVersion: v1
        metadata:
          name: aws-logging
          namespace: aws-observability
        data:
          output.conf: |
            [OUTPUT]
             Name  kinesis_firehose
             Match *
             region region-code
             delivery_stream my-stream-firehose
        ```

     1. Apply the manifest to your cluster.

        ```
        kubectl apply -f aws-logging-firehose-configmap.yaml
        ```

------

1. Set up permissions for the Fargate Pod execution role to send logs to your destination.

   1. Download the IAM policy for your destination to your computer.  
**Example**  

------
#### [ CloudWatch ]

      Download the CloudWatch IAM policy to your computer. You can also [view the policy](https://raw.githubusercontent.com/aws-samples/amazon-eks-fluent-logging-examples/mainline/examples/fargate/cloudwatchlogs/permissions.json) on GitHub.

      ```
      curl -O https://raw.githubusercontent.com/aws-samples/amazon-eks-fluent-logging-examples/mainline/examples/fargate/cloudwatchlogs/permissions.json
      ```

------
#### [ Amazon OpenSearch Service ]

      Download the OpenSearch IAM policy to your computer. You can also [view the policy](https://raw.githubusercontent.com/aws-samples/amazon-eks-fluent-logging-examples/mainline/examples/fargate/amazon-elasticsearch/permissions.json) on GitHub.

      ```
      curl -O https://raw.githubusercontent.com/aws-samples/amazon-eks-fluent-logging-examples/mainline/examples/fargate/amazon-elasticsearch/permissions.json
      ```

      Make sure that OpenSearch Dashboards' access control is configured properly. The `all_access role` in OpenSearch Dashboards needs to have the Fargate Pod execution role and the IAM role mapped. The same mapping must be done for the `security_manager` role. You can add the previous mappings by selecting `Menu`, then `Security`, then `Roles`, and then select the respective roles. For more information, see [How do I troubleshoot CloudWatch Logs so that it streams to my Amazon ES domain?](https://aws.amazon.com/tr/premiumsupport/knowledge-center/es-troubleshoot-cloudwatch-logs/).

------
#### [ Firehose ]

      Download the Firehose IAM policy to your computer. You can also [view the policy](https://raw.githubusercontent.com/aws-samples/amazon-eks-fluent-logging-examples/mainline/examples/fargate/kinesis-firehose/permissions.json) on GitHub.

      ```
      curl -O https://raw.githubusercontent.com/aws-samples/amazon-eks-fluent-logging-examples/mainline/examples/fargate/kinesis-firehose/permissions.json
      ```

------

   1. Create an IAM policy from the policy file that you downloaded.

      ```
      aws iam create-policy --policy-name eks-fargate-logging-policy --policy-document file://permissions.json
      ```

   1. Attach the IAM policy to the pod execution role specified for your Fargate profile with the following command. Replace *111122223333* with your account ID. Replace *AmazonEKSFargatePodExecutionRole* with your Pod execution role (for more information, see [Step 2: Create a Fargate Pod execution role](fargate-getting-started.md#fargate-sg-pod-execution-role)).

      ```
      aws iam attach-role-policy \
        --policy-arn arn:aws:iam::111122223333:policy/eks-fargate-logging-policy \
        --role-name AmazonEKSFargatePodExecutionRole
      ```

### Kubernetes filter support
<a name="fargate-logging-kubernetes-filter"></a>

The Fluent Bit Kubernetes filter allows you to add Kubernetes metadata to your log files. For more information about the filter, see [Kubernetes](https://docs.fluentbit.io/manual/pipeline/filters/kubernetes) in the Fluent Bit documentation. You can apply a filter using the API server endpoint.

```
filters.conf: |
    [FILTER]
        Name             kubernetes
        Match            kube.*
        Merge_Log           On
        Buffer_Size         0
        Kube_Meta_Cache_TTL 300s
```

**Important**  
 `Kube_URL`, `Kube_CA_File`, `Kube_Token_Command`, and `Kube_Token_File` are service owned configuration parameters and must not be specified. Amazon EKS Fargate populates these values.
 `Kube_Meta_Cache_TTL` is the time Fluent Bit waits until it communicates with the API server for the latest metadata. If `Kube_Meta_Cache_TTL` isn’t specified, Amazon EKS Fargate appends a default value of 30 minutes to lessen the load on the API server.

### To ship Fluent Bit process logs to your account
<a name="ship-fluent-bit-process-logs"></a>

You can optionally ship Fluent Bit process logs to Amazon CloudWatch using the following `ConfigMap`. Shipping Fluent Bit process logs to CloudWatch requires additional log ingestion and storage costs. Replace *region-code* with the AWS Region that your cluster is in.

```
kind: ConfigMap
apiVersion: v1
metadata:
  name: aws-logging
  namespace: aws-observability
  labels:
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  flb_log_cw: "true"  # Ships Fluent Bit process logs to CloudWatch.

  output.conf: |
    [OUTPUT]
        Name cloudwatch
        Match kube.*
        region region-code
        log_group_name fluent-bit-cloudwatch
        log_stream_prefix from-fluent-bit-
        auto_create_group true
```

The logs are in CloudWatch in the same AWS Region as the cluster. The log group name is ` my-cluster-fluent-bit-logs` and the Fluent Bit logstream name is `fluent-bit-podname-pod-namespace `.

**Note**  
The process logs are shipped only when the Fluent Bit process successfully starts. If there is a failure while starting Fluent Bit, the process logs are missed. You can only ship process logs to CloudWatch.
To debug shipping process logs to your account, you can apply the previous `ConfigMap` to get the process logs. Fluent Bit failing to start is usually due to your `ConfigMap` not being parsed or accepted by Fluent Bit while starting.

### To stop shipping Fluent Bit process logs
<a name="stop-fluent-bit-process-logs"></a>

Shipping Fluent Bit process logs to CloudWatch requires additional log ingestion and storage costs. To exclude process logs in an existing `ConfigMap` setup, do the following steps.

1. Locate the CloudWatch log group automatically created for your Amazon EKS cluster’s Fluent Bit process logs after enabling Fargate logging. It follows the format ` my-cluster-fluent-bit-logs`.

1. Delete the existing CloudWatch log streams created for each Pod’s process logs in the CloudWatch log group.

1. Edit the `ConfigMap` and set `flb_log_cw: "false"`.

1. Restart any existing Pods in the cluster.

## Test application
<a name="fargate-logging-test-application"></a>

1. Deploy a sample Pod.

   1. Save the following contents to a file named `sample-app.yaml` on your computer.

      ```
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: sample-app
        namespace: same-namespace-as-your-fargate-profile
      spec:
        replicas: 3
        selector:
          matchLabels:
            app: nginx
        template:
          metadata:
            labels:
              app: nginx
          spec:
            containers:
              - name: nginx
                image: nginx:latest
                ports:
                  - name: http
                    containerPort: 80
      ```

   1. Apply the manifest to the cluster.

      ```
      kubectl apply -f sample-app.yaml
      ```

1. View the NGINX logs using the destination(s) that you configured in the `ConfigMap`.

## Size considerations
<a name="fargate-logging-size-considerations"></a>

We suggest that you plan for up to 50 MB of memory for the log router. If you expect your application to generate logs at very high throughput then you should plan for up to 100 MB.

## Troubleshooting
<a name="fargate-logging-troubleshooting"></a>

To confirm whether the logging feature is enabled or disabled for some reason, such as an invalid `ConfigMap`, and why it’s invalid, check your Pod events with `kubectl describe pod pod-name `. The output might include Pod events that clarify whether logging is enabled or not, such as the following example output.

```
[...]
Annotations:          CapacityProvisioned: 0.25vCPU 0.5GB
                      Logging: LoggingDisabled: LOGGING_CONFIGMAP_NOT_FOUND
[...]
Events:
  Type     Reason           Age        From                                                           Message
  ----     ------           ----       ----                                                           -------
  Warning  LoggingDisabled  <unknown>  fargate-scheduler                                              Disabled logging because aws-logging configmap was not found. configmap "aws-logging" not found
```

The Pod events are ephemeral with a time period depending on the settings. You can also view a Pod’s annotations using `kubectl describe pod pod-name `. In the Pod annotation, there is information about whether the logging feature is enabled or disabled and the reason.

# Choose an optimal Amazon EC2 node instance type
<a name="choosing-instance-type"></a>

Amazon EC2 provides a wide selection of instance types for worker nodes. Each instance type offers different compute, memory, storage, and network capabilities. Each instance is also grouped in an instance family based on these capabilities. For a list, see [Available instance types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html#AvailableInstanceTypes) in the *Amazon EC2 User Guide*. Amazon EKS releases several variations of Amazon EC2 AMIs to enable support. To make sure that the instance type you select is compatible with Amazon EKS, consider the following criteria.
+ All Amazon EKS AMIs don’t currently support the `mac` family.
+ Arm and non-accelerated Amazon EKS AMIs don’t support the `g3`, `g4`, `inf`, and `p` families.
+ Accelerated Amazon EKS AMIs don’t support the `a`, `c`, `hpc`, `m`, and `t` families.
+ For Arm-based instances, Amazon Linux 2023 (AL2023) only supports instance types that use Graviton2 or later processors. AL2023 doesn’t support `A1` instances.

When choosing between instance types that are supported by Amazon EKS, consider the following capabilities of each type.

 **Number of instances in a node group**   
In general, fewer, larger instances are better, especially if you have a lot of Daemonsets. Each instance requires API calls to the API server, so the more instances you have, the more load on the API server.

 **Operating system**   
Review the supported instance types for [Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html), [Windows](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/instance-types.html), and [Bottlerocket](https://aws.amazon.com/bottlerocket/faqs/). Before creating Windows instances, review [Deploy Windows nodes on EKS clusters](windows-support.md).

 **Hardware architecture**   
Do you need x86 or Arm? Before deploying Arm instances, review [Amazon EKS optimized Arm Amazon Linux AMIs](eks-optimized-ami.md#arm-ami). Do you need instances built on the Nitro System ( [Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html#ec2-nitro-instances) or [Windows](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/instance-types.html#ec2-nitro-instances)) or that have [Accelerated](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/accelerated-computing-instances.html) capabilities? If you need accelerated capabilities, you can only use Linux with Amazon EKS.

 **Maximum number of Pods**   
Since each Pod is assigned its own IP address, the number of IP addresses supported by an instance type is a factor in determining the number of Pods that can run on the instance. To understand how the maximum number of Pods is determined for an instance type, see [How maxPods is determined](#max-pods-precedence).  
 [AWS Nitro System](https://aws.amazon.com/ec2/nitro/) instance types optionally support significantly more IP addresses than non-Nitro System instance types. However, not all IP addresses assigned for an instance are available to Pods. To assign a significantly larger number of IP addresses to your instances, you must have version `1.9.0` or later of the Amazon VPC CNI add-on installed in your cluster and configured appropriately. For more information, see [Assign more IP addresses to Amazon EKS nodes with prefixes](cni-increase-ip-addresses.md). To assign the largest number of IP addresses to your instances, you must have version `1.10.1` or later of the Amazon VPC CNI add-on installed in your cluster and deploy the cluster with the `IPv6` family.

 **IP family**   
You can use any supported instance type when using the `IPv4` family for a cluster, which allows your cluster to assign private `IPv4` addresses to your Pods and Services. But if you want to use the `IPv6` family for your cluster, then you must use [AWS Nitro System](https://aws.amazon.com/ec2/nitro/) instance types or bare metal instance types. Only `IPv4` is supported for Windows instances. Your cluster must be running version `1.10.1` or later of the Amazon VPC CNI add-on. For more information about using `IPv6`, see [Learn about IPv6 addresses to clusters, Pods, and services](cni-ipv6.md).

 **Version of the Amazon VPC CNI add-on that you’re running**   
The latest version of the [Amazon VPC CNI plugin for Kubernetes](https://github.com/aws/amazon-vpc-cni-k8s) supports [these instance types](https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/vpc/vpc_ip_resource_limit.go). You may need to update your Amazon VPC CNI add-on version to take advantage of the latest supported instance types. For more information, see [Assign IPs to Pods with the Amazon VPC CNI](managing-vpc-cni.md). The latest version supports the latest features for use with Amazon EKS. Earlier versions don’t support all features. You can view features supported by different versions in the [Changelog](https://github.com/aws/amazon-vpc-cni-k8s/blob/master/CHANGELOG.md) on GitHub.

 ** AWS Region that you’re creating your nodes in**   
Not all instance types are available in all AWS Regions.

 **Whether you’re using security groups for Pods**   
If you’re using security groups for Pods, only specific instance types are supported. For more information, see [Assign security groups to individual Pods](security-groups-for-pods.md).

## How maxPods is determined
<a name="max-pods-precedence"></a>

The final `maxPods` value applied to a node depends on several components that interact in a specific order of precedence. Understanding this order helps you avoid unexpected behavior when customizing `maxPods`.

 **Order of precedence (highest to lowest):** 

1.  **Managed node group enforcement** – When you use a managed node group without a [custom AMI](launch-templates.md#launch-template-custom-ami), Amazon EKS enforces a cap on `maxPods` in the node’s user data. For instances with less than 30 vCPUs, the cap is `110`. For instances with greater than 30 vCPUs, the cap is `250`. This value takes precedence over any other `maxPods` configuration, including `maxPodsExpression`.

1.  **kubelet `maxPods` configuration** – If you set `maxPods` directly in the kubelet configuration (for example, through a launch template with a custom AMI), this value takes precedence over `maxPodsExpression`.

1.  **nodeadm `maxPodsExpression` ** – If you use [https://awslabs.github.io/amazon-eks-ami/nodeadm/doc/examples/#defining-a-max-pods-expression](https://awslabs.github.io/amazon-eks-ami/nodeadm/doc/examples/#defining-a-max-pods-expression) in your `NodeConfig`, nodeadm evaluates the expression to calculate `maxPods`. This is only effective when the value is not already set by a higher-precedence source.

1.  **Default ENI-based calculation** – If no other value is set, the AMI calculates `maxPods` based on the number of elastic network interfaces and IP addresses supported by the instance type. This is equivalent to the formula `(number of ENIs × (IPs per ENI − 1)) + 2`. The `+ 2` accounts for the Amazon VPC CNI and `kube-proxy` running on every node, which don’t consume a Pod IP address.

**Important**  
If you use a managed node group and set `maxPodsExpression` in your `NodeConfig`, the managed node group’s enforcement overrides your expression. To use a custom `maxPods` value with managed node groups, you must specify a custom AMI in your launch template and set `maxPods` directly. For more information, see [Customize managed nodes with launch templates](launch-templates.md).

 **Managed node groups vs. self-managed nodes** 

With managed node groups (without a custom AMI), Amazon EKS injects the `maxPods` value into the node’s bootstrap user data. This means:
+ The `maxPods` value is always capped at `110` or `250` depending on instance size.
+ Any `maxPodsExpression` you configure is overridden by this injected value.
+ To use a different `maxPods` value, specify a custom AMI in your launch template and pass `--use-max-pods false` along with `--kubelet-extra-args '--max-pods=my-value'` to the `bootstrap.sh` script. For examples, see [Customize managed nodes with launch templates](launch-templates.md).

With self-managed nodes, you have full control over the bootstrap process. You can use `maxPodsExpression` in your `NodeConfig` or pass `--max-pods` directly to `bootstrap.sh`.

## Considerations for EKS Auto Mode
<a name="_considerations_for_eks_auto_mode"></a>

EKS Auto Mode limits the number of pods on nodes to the lower of:
+ 110 pods hard cap
+ The result of the max pods calculation described above.

# Create nodes with pre-built optimized images
<a name="eks-optimized-amis"></a>

You can deploy nodes with pre-built Amazon EKS optimized [Amazon Machine Images](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) (AMIs) or your own custom AMIs when you use managed node groups or self-managed nodes. If you are running hybrid nodes, see [Prepare operating system for hybrid nodes](hybrid-nodes-os.md). For information about each type of Amazon EKS optimized AMI, see one of the following topics. For instructions on how to create your own custom AMI, see [Build a custom EKS-optimized Amazon Linux AMI](eks-ami-build-scripts.md).

With Amazon EKS Auto Mode, EKS manages the EC2 instance including selecting and updating the AMI.

**Topics**
+ [

# Guide to EKS AL2 & AL2-Accelerated AMIs transition features
](eks-ami-deprecation-faqs.md)
+ [

# Create nodes with optimized Amazon Linux AMIs
](eks-optimized-ami.md)
+ [

# Create nodes with optimized Bottlerocket AMIs
](eks-optimized-ami-bottlerocket.md)
+ [

# Create nodes with optimized Ubuntu Linux AMIs
](eks-partner-amis.md)
+ [

# Create nodes with optimized Windows AMIs
](eks-optimized-windows-ami.md)

# Guide to EKS AL2 & AL2-Accelerated AMIs transition features
<a name="eks-ami-deprecation-faqs"></a>

**Warning**  
Amazon EKS stopped publishing EKS-optimized Amazon Linux 2 (AL2) AMIs on November 26, 2025. AL2023 and Bottlerocket based AMIs for Amazon EKS are available for all supported Kubernetes versions including 1.33 and higher.

 AWS will end support for EKS AL2-optimized and AL2-accelerated AMIs, effective November 26, 2025. While you can continue using EKS AL2 AMIs after the end-of-support (EOS) date (November 26, 2025), EKS will no longer release any new Kubernetes versions or updates to AL2 AMIs, including minor releases, patches, and bug fixes after this date. We recommend upgrading to Amazon Linux 2023 (AL2023) or Bottlerocket AMIs:
+ AL2023 enables a secure-by-default approach with preconfigured security policies, SELinux in permissive mode, IMDSv2-only mode enabled by default, optimized boot times, and improved package management for enhanced security and performance, well-suited for infrastructure requiring significant customizations like direct OS-level access or extensive node changes. To learn more, see [AL2023 FAQs](https://aws.amazon.com/linux/amazon-linux-2023/faqs/) or view our detailed migration guidance at [Upgrade from Amazon Linux 2 to Amazon Linux 2023](al2023.md).
+ Bottlerocket enables enhanced security, faster boot times, and a smaller attack surface for improved efficiency with its purpose-built, container-optimized design, well-suited for container-native approaches with minimal node customizations. To learn more, see [Bottlerocket FAQs](https://aws.amazon.com/bottlerocket/faqs/) or view our detailed migration guidance at [Create nodes with optimized Bottlerocket AMIs](eks-optimized-ami-bottlerocket.md).

Alternatively, you can [Build a custom EKS-optimized Amazon Linux AMI](eks-ami-build-scripts.md) until the EOS date (November 26, 2025). Additionally, you can build a custom AMI with an Amazon Linux 2 base instance until the Amazon Linux 2 EOS date (June 30, 2026).

## Migration and support FAQs
<a name="_migration_and_support_faqs"></a>

### How do I migrate from my AL2 to an AL2023 AMI?
<a name="_how_do_i_migrate_from_my_al2_to_an_al2023_ami"></a>

We recommend creating and implementing a migration plan that includes thorough application workload testing and documented rollback procedures, then following the step-by-step instructions in the [Upgrade from Amazon Linux 2 to Amazon Linux 2023](https://docs.aws.amazon.com/eks/latest/userguide/al2023.html) in EKS official documentation.

### Can I build a custom AL2 AMI past the EKS end-of-support (EOS) date for EKS optimized AL2 AMIs?
<a name="_can_i_build_a_custom_al2_ami_past_the_eks_end_of_support_eos_date_for_eks_optimized_al2_amis"></a>

While we recommend moving to officially supported and published EKS optimized AMIs for AL2023 or Bottlerocket, you can build custom EKS AL2-optimized and AL2-accelerated AMIs until the AL2 AMI EOS date (November 26, 2025). Alternatively, you can build a custom AMI with an Amazon Linux 2 base instance until the Amazon Linux 2 EOS date (June 30, 2026). For step-by-step instructions to build a custom EKS AL2-optimized and AL2-accelerated AMI, see [Build a custom Amazon Linux AMI](https://docs.aws.amazon.com/eks/latest/userguide/eks-ami-build-scripts.html) in EKS official documentation.

### Does the EKS Kubernetes version support policy apply to Amazon Linux distributions?
<a name="_does_the_eks_kubernetes_version_support_policy_apply_to_amazon_linux_distributions"></a>

No. The EOS date for EKS AL2-optimized and AL2-accelerated AMIs is independent of the standard and extended support timelines for Kubernetes versions by EKS. You need to migrate to AL2023 or Bottlerocket even if you are using EKS extended support.

### How does the shift from cgroupv1 to cgroupv2 affect my migration?
<a name="_how_does_the_shift_from_cgroupv1_to_cgroupv2_affect_my_migration"></a>

The [Kubernetes community](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/4569-cgroup-v1-maintenance-mode/README.md) moved `cgroupv1` support (used by AL2) into maintenance mode, meaning no new features will be added and only critical security and major bug fixes will be provided. To adopt `cgroupv2` in Kubernetes, you need to ensure compatibility across the OS, kernel, container runtime, and Kubernetes components. This requires a Linux distribution that enables `cgroupv2` by default, such as AL2023, Bottlerocket, Red Hat Enterprise Linux (RHEL) 9\$1, Ubuntu 22.04\$1, or Debian 11\$1. These distributions ship with kernel versions ≥5.8, which is the minimum requirement for `cgroupv2` support in Kubernetes. To learn more, see [About cgroup v2](https://kubernetes.io/docs/concepts/architecture/cgroups/).

### What do I do if I need Neuron in my custom AL2 AMI?
<a name="_what_do_i_do_if_i_need_neuron_in_my_custom_al2_ami"></a>

You cannot run your full Neuron-powered applications natively on an AL2-based AMIs. To leverage AWS Neuron on an AL2 AMI, you must containerize you applications using a Neuron-supported container with a non-AL2 Linux distribution (e.g., Ubuntu 22.04, Amazon Linux 2023, etc.) and then deploy those containers on an AL2-based AMI that has the Neuron Driver (`aws-neuronx-dkms`) installed.

### Should I switch to a bare Amazon Linux 2 base instance after the EKS AL2 AMI EOS date (November 26, 2025)?
<a name="_should_i_switch_to_a_bare_amazon_linux_2_base_instance_after_the_eks_al2_ami_eos_date_november_26_2025"></a>

Switching to a bare Amazon Linux 2 base instance lacks the specific optimizations, container runtime configurations, and customizations provided by the official EKS AL2-optimized and AL2-accelerated AMIs. Instead, if you must continue using an AL2-based solution, we recommend building a custom AMI using the EKS AMI recipes at [Build a custom EKS-optimized Amazon Linux AMI](eks-ami-build-scripts.md) or [Amazon EKS AMI Build Specification](https://github.com/awslabs/amazon-eks-ami). This ensures compatibility with your existing workloads and includes AL2 kernel updates until the Amazon Linux 2 EOS date (June 30, 2026).

### When building a custom AL2 AMI using the EKS AMI GitHub repository after the EKS AL2 AMI EOS date (November 26, 2025), what support is available for packages from repositories like amzn2-core and amzn2extra-docker?
<a name="_when_building_a_custom_al2_ami_using_the_eks_ami_github_repository_after_the_eks_al2_ami_eos_date_november_26_2025_what_support_is_available_for_packages_from_repositories_like_amzn2_core_and_amzn2extra_docker"></a>

The EKS AMI recipe at [Amazon EKS AMI Build Specification](https://github.com/awslabs/amazon-eks-ami) pulls packages via YUM from standard Amazon Linux 2 software such as [amzn2-core](https://docs.aws.amazon.com/linux/al2/ug/managing-software.html) and [amzn2extra-docker](https://docs.aws.amazon.com/linux/al2/ug/managing-software.html). After the EKS AL2 AMI EOS date (November 26, 2025), this software will continue to be supported until the broader Amazon Linux 2 EOS date (June 30, 2026). Note that support is limited to kernel updates during this period, meaning you will need to manually manage and apply other package updates, security patches, and any non-kernel dependencies to maintain security and compatibility.

### Why might Java applications using older versions of JDK8 on Amazon EKS with AL2023 experience Out of Memory (OOM) exceptions and pod restarts, and how can this be resolved?
<a name="_why_might_java_applications_using_older_versions_of_jdk8_on_amazon_eks_with_al2023_experience_out_of_memory_oom_exceptions_and_pod_restarts_and_how_can_this_be_resolved"></a>

When running on Amazon EKS nodes with AL2023, Java applications relying on JDK 8 versions prior to `jdk8u372` can cause OOM exceptions and pod restarts because the JVM is not compatible with `cgroupv2`. This issue arises specifically from the JVM’s inability to detect container memory limits using `cgroupv2`, the default in Amazon Linux 2023. As a result, it bases heap allocation on the node’s total memory rather than the pod’s defined limit. This stems from `cgroupv2` changing the storage location for memory limit data, causing older Java versions to misread available memory and assume node-level resources. A few possible options include:
+  **Upgrade JDK version**: Upgrading to `jdk8u372` or later, or to a newer JDK version with full `cgroupv2` support, can resolve this issue. For a list of compatible Java versions that fully support `cgroupv2`, see [About cgroup v2](https://kubernetes.io/docs/concepts/architecture/cgroups/).
+  **Build a custom AMI**: If you must continue using an AL2-based solution, you can build a custom AL2-based AMI (until November 26, 2025) using [Build a custom EKS-optimized Amazon Linux AMI](eks-ami-build-scripts.md) or [Amazon EKS AMI Build Specification](https://github.com/awslabs/amazon-eks-ami). For example, you can build an AL2-based v1.33 AMI (until November 26, 2025). Amazon EKS will provide AL2-based AMIs until the EKS AL2 EOS date (November 26, 2025). After the EOS date (November 26, 2025), you will need to build your own AMI.
+  **Enable cgroupv1**: If you must continue using `cgroupv1`, you can enable `cgroupv1` on an EKS AL2023 AMI. To enable, run `sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"` and reboot the system (e.g., EC2 instance or node running Amazon Linux 2023). This will modify the boot parameters for the system (e.g., by adding the kernel parameter 'systemd.unified\$1cgroup\$1hierarchy=0' to the GRUB configuration, which instructs systemd to use the legacy `cgroupv1` hierarchy) and enable `cgroupv1`. Note that when you run this grubby command, you are reconfiguring the kernel to launch with `cgroupv1` enabled and `cgroupv2` disabled. Only one of these cgroup versions is used for active resource management on a node. This is not the same as running `cgroupv2` with backwards compatibility for the `cgroupv1` API.

**Warning**  
We do not recommend the continued use of `cgroupv1`. Instead, we recommend migrating to `cgroupv2`. The Kubernetes community moved `cgroupv1` support (used by AL2) into maintenance mode, meaning no new features or updates will be added and only critical security and major bug fixes will be provided. The full removal of `cgroupv1` support is expected in a future release, though a specific date for this full removal has not yet been announced. If you experience issues with `cgroupv1`, AWS will be unable to provide support and recommend that you upgrade to `cgroupv2`.

## Compatibility and versions
<a name="_compatibility_and_versions"></a>

### Supported Kubernetes versions for AL2 AMIs
<a name="_supported_kubernetes_versions_for_al2_amis"></a>

Kubernetes version 1.32 is the last version for which Amazon EKS will release AL2 (Amazon Linux 2) AMIs. For [supported](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html) Kubernetes versions up to 1.32, EKS will continue to release AL2 AMIs (AL2\$1ARM\$164, AL2\$1x86\$164) and AL2-accelerated AMIs (AL2\$1x86\$164\$1GPU) until November 26, 2025. After this date, EKS will stop releasing AL2-optimized and AL2-accelerated AMIs for all Kubernetes versions. Note that the EOS date for EKS AL2-optimized and AL2-accelerated AMIs is independent of the standard and extended support timelines for Kubernetes versions by EKS.

### Supported drivers and Linux kernel versions comparison for AL2, AL2023, and Bottlerocket AMIs
<a name="_supported_drivers_and_linux_kernel_versions_comparison_for_al2_al2023_and_bottlerocket_amis"></a>


| Component | EKS AL2 AMI | EKS AL2023 AMI | EKS Bottlerocket AMI | 
| --- | --- | --- | --- | 
|  Base OS Compatibility  |  RHEL7/CentOS 7  |  Fedora/CentOS 9  |  N/A  | 
|   [CUDA user mode driver](https://docs.nvidia.com/deploy/cuda-compatibility/why-cuda-compatibility.html#why-cuda-compatibility)   |  12.x  |  12.x,13.x  |  12.x,13.x  | 
|  NVIDIA GPU Driver  |  R570  |  R580  |  R570, R580  | 
|   AWS Neuron Driver  |  2.20\$1  |  2.20\$1  |  2.20\$1  | 
|  Linux Kernel  |  5.10  |  6.1, 6.12  |  6.1, 6.12  | 

For more information on NVIDIA driver and CUDA compatibility, see the [NVIDIA documentation](https://docs.nvidia.com/datacenter/tesla/drivers/index.html#supported-drivers-and-cuda-toolkit-versions).

### AWS Neuron compatibility with AL2 AMIs
<a name="shared_aws_neuron_compatibility_with_al2_amis"></a>

Starting from [AWS Neuron release 2.20](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/rn.html#neuron-2-20-0-whatsnew), the Neuron Runtime (`aws-neuronx-runtime-lib`) used by EKS AL-based AMIs no longer supports Amazon Linux 2 (AL2). The Neuron Driver (`aws-neuronx-dkms`) is now the only AWS Neuron package that supports Amazon Linux 2. This means you cannot run your Neuron-powered applications natively on an AL2-based AMI. To setup Neuron on AL2023 AMIs, see the [AWS Neuron Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/index.html#setup-guide-index) guide.

### Kubernetes compatibility with AL2 AMIs
<a name="_kubernetes_compatibility_with_al2_amis"></a>

The Kubernetes community has moved `cgroupv1` support (used by AL2) to maintenance mode. This means no new features will be added, and only critical security and major bug fixes will be provided. Any Kubernetes features relying on cgroupv2, such as MemoryQoS and enhanced resource isolation, are unavailable on AL2. Furthermore, Amazon EKS Kubernetes version 1.32 was the last version to support AL2 AMIs. To maintain compatibility with the latest Kubernetes versions, we recommend migrating to AL2023 or Bottlerocket, which enable `cgroupv2` by default.

### Linux version compatibility with AL2 AMIs
<a name="_linux_version_compatibility_with_al2_amis"></a>

Amazon Linux 2 (AL2) is supported by AWS until its end-of-support (EOS) date on June 30, 2026. However, as AL2 has aged, support from the broader Linux community for new applications and functionality has become more limited. AL2 AMIs are based on [Linux kernel 5.10](https://docs.aws.amazon.com/linux/al2/ug/kernel.html), while AL2023 uses [Linux kernel 6.1](https://docs.aws.amazon.com/linux/al2023/ug/compare-with-al2-kernel.html). Unlike AL2023, AL2 has limited support from the broader Linux community. This means many upstream Linux packages and tools need to be backported to work with AL2’s older kernel version, some modern Linux features and security improvements aren’t available due to the older kernel, many open source projects have deprecated or limited support for older kernel versions like 5.10.

### Deprecated packages not included in AL2023
<a name="_deprecated_packages_not_included_in_al2023"></a>

A few of the most common packages that are not included or which changed in AL2023, include:
+ Some [source binary packages in Amazon Linux 2](https://docs.aws.amazon.com/linux/al2023/release-notes/removed-AL2023.6-AL2.html) are no longer available in Amazon Linux 2023
+ Changes in how Amazon Linux supports different versions of packages (e.g., [amazon-linux-extras system](https://repost.aws/questions/QUWGU3VFJMRSGf6MDPWn4tLg/how-to-resolve-amazon-linux-extras-in-al2023)) in AL2023
+  [Extra Packages for Enterprise Linux (EPEL)](https://docs.aws.amazon.com/linux/al2023/ug/epel.html) are not supported in AL2023
+  [32-bit applications](https://docs.aws.amazon.com/linux/al2023/ug/deprecated-al2.html#deprecated-32bit-rpms) are not supported in AL2023

To learn more, see [Comparing AL2 and AL2023](https://docs.aws.amazon.com/linux/al2023/ug/compare-with-al2.html).

### FIPS validation comparison across AL2, AL2023, and Bottlerocket
<a name="_fips_validation_comparison_across_al2_al2023_and_bottlerocket"></a>

Amazon Linux 2 (AL2), Amazon Linux 2023 (AL2023), and Bottlerocket provide support for Federal Information Processing Standards (FIPS) compliance.
+ AL2 is certified under FIPS 140-2 and AL2023 is certified under FIPS 140-3. To enable FIPS mode on AL2023, install the necessary packages on your Amazon EC2 instance and follow the configuration steps using the instructions in [Enable FIPS Mode on AL2023](https://docs.aws.amazon.com/linux/al2023/ug/fips-mode.html). To learn more, see [AL2023 FAQs](https://aws.amazon.com/linux/amazon-linux-2023/faqs).
+ Bottlerocket provides purpose-built variants specifically for FIPS which constrain the kernel and userspace components to the use of cryptographic modules that have been submitted to the FIPS 140-3 Cryptographic Module Validation Program.

### EKS AMI driver and versions changelog
<a name="_eks_ami_driver_and_versions_changelog"></a>

For a complete list of all EKS AMI components and their versions, see [Amazon EKS AMI Release Notes](https://github.com/awslabs/amazon-eks-ami/releases) on GitHub.

# Create nodes with optimized Amazon Linux AMIs
<a name="eks-optimized-ami"></a>

Amazon Elastic Kubernetes Service (Amazon EKS) provides specialized Amazon Machine Images (AMIs) optimized for running Kubernetes worker nodes. These EKS-optimized Amazon Linux (AL) AMIs are pre-configured with essential components—such as `kubelet`, the AWS IAM Authenticator, and `containerd`—to ensure seamless integration and security within your clusters. This guide details the available AMI versions and outlines specialized options for accelerated computing and Arm-based architectures.

## Considerations
<a name="ami-considerations"></a>
+ You can track security or privacy events for Amazon Linux at the [Amazon Linux security center](https://alas.aws.amazon.com/) by choosing the tab for your desired version. You can also subscribe to the applicable RSS feed. Security and privacy events include an overview of the issue, what packages are affected, and how to update your instances to correct the issue.
+ Before deploying an accelerated or Arm AMI, review the information in [Amazon EKS-optimized accelerated Amazon Linux AMIs](#gpu-ami) and [Amazon EKS-optimized Arm Amazon Linux AMIs](#arm-ami).
+ Amazon EC2 `P2` instances aren’t supported on Amazon EKS because they require `NVIDIA` driver version 470 or earlier.
+ Any newly created managed node groups in clusters on version `1.30` or newer will automatically default to using AL2023 as the node operating system.

## Amazon EKS-optimized accelerated Amazon Linux AMIs
<a name="gpu-ami"></a>

Amazon EKS-optimized accelerated Amazon Linux (AL) AMIs are built on top of the standard EKS-optimized Amazon Linux AMIs. They are configured to serve as optional images for Amazon EKS nodes to support GPU, [Inferentia](https://aws.amazon.com/machine-learning/inferentia/), and [Trainium](https://aws.amazon.com/machine-learning/trainium/) based workloads.

For more information, see [Use EKS-optimized accelerated AMIs for GPU instances](ml-eks-optimized-ami.md).

## Amazon EKS-optimized Arm Amazon Linux AMIs
<a name="arm-ami"></a>

Arm instances deliver significant cost savings for scale-out and Arm-based applications such as web servers, containerized microservices, caching fleets, and distributed data stores. When adding Arm nodes to your cluster, review the following considerations.
+ If your cluster was deployed before August 17, 2020, you must do a one-time upgrade of critical cluster add-on manifests. This is so that Kubernetes can pull the correct image for each hardware architecture in use in your cluster. For more information about updating cluster add-ons, see [Step 1: Prepare for upgrade](update-cluster.md#update-existing-cluster). If you deployed your cluster on or after August 17, 2020, then your CoreDNS, `kube-proxy`, and Amazon VPC CNI plugin for Kubernetes add-ons are already multi-architecture capable.
+ Applications deployed to Arm nodes must be compiled for Arm.
+ If you have DaemonSets that are deployed in an existing cluster, or you want to deploy them to a new cluster that you also want to deploy Arm nodes in, then verify that your DaemonSet can run on all hardware architectures in your cluster.
+ You can run Arm node groups and x86 node groups in the same cluster. If you do, consider deploying multi-architecture container images to a container repository such as Amazon Elastic Container Registry and then adding node selectors to your manifests so that Kubernetes knows what hardware architecture a Pod can be deployed to. For more information, see [Pushing a multi-architecture image](https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-multi-architecture-image.html) in the *Amazon ECR User Guide* and the [Introducing multi-architecture container images for Amazon ECR](https://aws.amazon.com/blogs/containers/introducing-multi-architecture-container-images-for-amazon-ecr) blog post.

## More information
<a name="linux-more-information"></a>

For more information about using Amazon EKS-optimized Amazon Linux AMIs, see the following sections:
+ To use Amazon Linux with managed node groups, see [Simplify node lifecycle with managed node groups](managed-node-groups.md).
+ To launch self-managed Amazon Linux nodes, see [Retrieve recommended Amazon Linux AMI IDs](retrieve-ami-id.md).
+ For version information, see [Retrieve Amazon Linux AMI version information](eks-linux-ami-versions.md).
+ To retrieve the latest IDs of the Amazon EKS-optimized Amazon Linux AMIs, see [Retrieve recommended Amazon Linux AMI IDs](retrieve-ami-id.md).
+ For open-source scripts that are used to build the Amazon EKS-optimized AMIs, see [Build a custom EKS-optimized Amazon Linux AMI](eks-ami-build-scripts.md).

# Upgrade from Amazon Linux 2 to Amazon Linux 2023
<a name="al2023"></a>

**Warning**  
Amazon EKS stopped publishing EKS-optimized Amazon Linux 2 (AL2) AMIs on November 26, 2025. AL2023 and Bottlerocket based AMIs for Amazon EKS are available for all supported Kubernetes versions including 1.33 and higher.

AL2023 is a Linux-based operating system designed to provide a secure, stable, and high-performance environment for your cloud applications. It’s the next generation of Amazon Linux from Amazon Web Services and is available across all supported Amazon EKS versions.

AL2023 offers several improvements over AL2. For a full comparison, see [Comparing AL2 and Amazon Linux 2023](https://docs.aws.amazon.com/linux/al2023/ug/compare-with-al2.html) in the *Amazon Linux 2023 User Guide*. Several packages have been added, upgraded, and removed from AL2. It’s highly recommended to test your applications with AL2023 before upgrading. For a list of all package changes in AL2023, see [Package changes in Amazon Linux 2023](https://docs.aws.amazon.com/linux/al2023/release-notes/compare-packages.html) in the *Amazon Linux 2023 Release Notes*.

In addition to these changes, you should be aware of the following:
+ AL2023 introduces a new node initialization process `nodeadm` that uses a YAML configuration schema. If you’re using self-managed node groups or an AMI with a launch template, you’ll now need to provide additional cluster metadata explicitly when creating a new node group. An [example](https://awslabs.github.io/amazon-eks-ami/nodeadm/) of the minimum required parameters is as follows, where `apiServerEndpoint`, `certificateAuthority`, and service `cidr` are now required:

  ```
  ---
  apiVersion: node.eks.aws/v1alpha1
  kind: NodeConfig
  spec:
    cluster:
      name: my-cluster
      apiServerEndpoint: https://example.com
      certificateAuthority: Y2VydGlmaWNhdGVBdXRob3JpdHk=
      cidr: 10.100.0.0/16
  ```

  In AL2, the metadata from these parameters was discovered from the Amazon EKS `DescribeCluster` API call. With AL2023, this behavior has changed since the additional API call risks throttling during large node scale ups. This change doesn’t affect you if you’re using managed node groups without a launch template or if you’re using Karpenter. For more information on `certificateAuthority` and service `cidr`, see [https://docs.aws.amazon.com/eks/latest/APIReference/API_DescribeCluster.html](https://docs.aws.amazon.com/eks/latest/APIReference/API_DescribeCluster.html) in the *Amazon EKS API Reference*.
+ For AL2023, `nodeadm` also changes the format to apply parameters to the `kubelet` for each node using [https://awslabs.github.io/amazon-eks-ami/nodeadm/doc/api/#nodeconfigspec](https://awslabs.github.io/amazon-eks-ami/nodeadm/doc/api/#nodeconfigspec). In AL2, this was done with the `--kubelet-extra-args` parameter. This is commonly used to add labels and taints to nodes. An example below shows applying `maxPods` and `--node-labels` to the node.

  ```
  ---
  apiVersion: node.eks.aws/v1alpha1
  kind: NodeConfig
  spec:
    cluster:
      name: test-cluster
      apiServerEndpoint: https://example.com
      certificateAuthority: Y2VydGlmaWNhdGVBdXRob3JpdHk=
      cidr: 10.100.0.0/16
    kubelet:
      config:
        maxPods: 110
      flags:
        - --node-labels=karpenter.sh/capacity-type=on-demand,karpenter.sh/nodepool=test
  ```
+ Amazon VPC CNI version `1.16.2` or greater is required for AL2023.
+ AL2023 requires `IMDSv2` by default. `IMDSv2` has several benefits that help improve security posture. It uses a session-oriented authentication method that requires the creation of a secret token in a simple HTTP PUT request to start the session. A session’s token can be valid for anywhere between 1 second and 6 hours. For more information on how to transition from `IMDSv1` to `IMDSv2`, see [Transition to using Instance Metadata Service Version 2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-metadata-transition-to-version-2.html) and [Get the full benefits of IMDSv2 and disable IMDSv1 across your AWS infrastructure](https://aws.amazon.com/blogs/security/get-the-full-benefits-of-imdsv2-and-disable-imdsv1-across-your-aws-infrastructure). If you would like to use `IMDSv1`, you can still do so by manually overriding the settings using instance metadata option launch properties.
**Note**  
For `IMDSv2` with AL2023, the default hop count for managed node groups can vary:  
When not using a launch template, the default is set to `1`. This means that containers won’t have access to the node’s credentials using IMDS. If you require container access to the node’s credentials, you can still do so by using a [custom Amazon EC2 launch template](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-launchtemplate-metadataoptions.html).
When using a custom AMI in a launch template, the default `HttpPutResponseHopLimit` is set to `2`. You can manually override the `HttpPutResponseHopLimit` in the launch template.
Alternatively, you can use [Amazon EKS Pod Identity](pod-identities.md) to provide credentials instead of `IMDSv2`.
+ AL2023 features the next generation of unified control group hierarchy (`cgroupv2`). `cgroupv2` is used to implement a container runtime, and by `systemd`. While AL2023 still includes code that can make the system run using `cgroupv1`, this isn’t a recommended or supported configuration. This configuration will be completely removed in a future major release of Amazon Linux.
+  `eksctl` version `0.176.0` or greater is required for `eksctl` to support AL2023.

For previously existing managed node groups, you can either perform an in-place upgrade or a blue/green upgrade depending on how you’re using a launch template:
+ If you’re using a custom AMI with a managed node group, you can perform an in-place upgrade by swapping the AMI ID in the launch template. You should ensure that your applications and any user data transfer over to AL2023 first before performing this upgrade strategy.
+ If you’re using managed node groups with either the standard launch template or with a custom launch template that doesn’t specify the AMI ID, you’re required to upgrade using a blue/green strategy. A blue/green upgrade is typically more complex and involves creating an entirely new node group where you would specify AL2023 as the AMI type. The new node group will need to then be carefully configured to ensure that all custom data from the AL2 node group is compatible with the new OS. Once the new node group has been tested and validated with your applications, Pods can be migrated from the old node group to the new node group. Once the migration is completed, you can delete the old node group.

If you’re using Karpenter and want to use AL2023, you’ll need to modify the `EC2NodeClass` `amiFamily` field with AL2023. By default, Drift is enabled in Karpenter. This means that once the `amiFamily` field has been changed, Karpenter will automatically update your worker nodes to the latest AMI when available.

## Additional Information About nodeadm
<a name="_additional_information_about_nodeadm"></a>

When utilizing an EKS-optimized Amazon Linux 2023 AMI or building a Custom EKS Amazon Linux 2023 AMI via the Packer scripts provided in the official amazon-eks-ami GitHub repository, you should avoid explicitly running nodeadm init within EC2 User Data or as part of your custom AMI.

If you want to generate dynamic NodeConfig in your user-data, you can write that configuration to a drop-in yaml or json file in `/etc/eks/nodeadm.d`. These configuration files will be merged and applied to your node when nodeadm init is automatically started later in the boot process. For example:

```
cat > /etc/eks/nodeadm.d/additional-node-labels.yaml << EOF
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  kubelet:
    flags:
      - --node-labels=foo=bar
EOF
```

The EKS-optimized Amazon Linux 2023 AMIs automatically execute nodeadm init in two phases through separate systemd services: nodeadm-config runs before user data execution, while nodeadm-run executes afterward. The nodeadm-config service establishes baseline configurations for containerd and kubelet before user data runs. The nodeadm-run service runs select system daemons and completes any final configurations following user data execution. If the nodeadm init command is run an additional time, via user data or a custom AMI, it may break assumptions about execution ordering, leading to unexpected outcomes including misconfigured ENIs.

# Retrieve Amazon Linux AMI version information
<a name="eks-linux-ami-versions"></a>

Amazon EKS optimized Amazon Linux AMIs are versioned by Kubernetes version and the release date of the AMI in the following format:

```
k8s_major_version.k8s_minor_version.k8s_patch_version-release_date
```

Each AMI release includes various versions of [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/), the Linux kernel, and [containerd](https://containerd.io/). The accelerated AMIs also include various versions of the NVIDIA driver. You can find this version information in the [Changelog](https://github.com/awslabs/amazon-eks-ami/blob/main/CHANGELOG.md) on GitHub.

# Retrieve recommended Amazon Linux AMI IDs
<a name="retrieve-ami-id"></a>

When deploying nodes, you can specify an ID for a pre-built Amazon EKS optimized Amazon Machine Image (AMI). To retrieve an AMI ID that fits your desired configuration, query the AWS Systems Manager Parameter Store API. Using this API eliminates the need to manually look up Amazon EKS optimized AMI IDs. For more information, see [GetParameter](https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_GetParameter.html). The [IAM principal](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html#iam-term-principal) that you use must have the `ssm:GetParameter` IAM permission to retrieve the Amazon EKS optimized AMI metadata.

You can retrieve the image ID of the latest recommended Amazon EKS optimized Amazon Linux AMI with the following command, which uses the sub-parameter `image_id`. Make the following modifications to the command as needed and then run the modified command:
+ Replace `<kubernetes-version>` with an [Amazon EKS supported version](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html).
+ Replace *ami-type* with one of the following options. For information about the types of Amazon EC2 instances, see [Amazon EC2 instance types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html).
  + Use *amazon-linux-2023/x86\$164/standard* for Amazon Linux 2023 (AL2023) `x86` based instances.
  + Use *amazon-linux-2023/arm64/standard* for AL2023 ARM instances, such as [AWS Graviton](https://aws.amazon.com/ec2/graviton/) based instances.
  + Use *amazon-linux-2023/x86\$164/nvidia* for the latest approved AL2023 NVIDIA `x86` based instances.
  + Use *amazon-linux-2023/arm64/nvidia* for the latest approved AL2023 NVIDIA `arm64` based instances.
  + Use *amazon-linux-2023/x86\$164/neuron* for the latest AL2023 [AWS Neuron](https://aws.amazon.com/machine-learning/neuron/) instances.
+ Replace `<region-code>` with an [Amazon EKS supported AWS Region](https://docs.aws.amazon.com/general/latest/gr/eks.html) for which you want the AMI ID.

```
aws ssm get-parameter --name /aws/service/eks/optimized-ami/<kubernetes-version>/<ami-type>/recommended/image_id \
    --region <region-code> --query "Parameter.Value" --output text
```

Here’s an example command after placeholder replacements have been made.

```
aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.31/amazon-linux-2023/x86_64/standard/recommended/image_id \
    --region us-west-2 --query "Parameter.Value" --output text
```

An example output is as follows.

```
ami-1234567890abcdef0
```

# Build a custom EKS-optimized Amazon Linux AMI
<a name="eks-ami-build-scripts"></a>

**Warning**  
Amazon EKS stopped publishing EKS-optimized Amazon Linux 2 (AL2) AMIs on November 26, 2025. AL2023 and Bottlerocket based AMIs for Amazon EKS are available for all supported Kubernetes versions including 1.33 and higher.

Amazon EKS provides open-source build scripts in the [Amazon EKS AMI Build Specification](https://github.com/awslabs/amazon-eks-ami) repository that you can use to view the configurations for `kubelet`, the runtime, the AWS IAM Authenticator for Kubernetes, and build your own AL-based AMI from scratch.

This repository contains the specialized [bootstrap script for AL2](https://github.com/awslabs/amazon-eks-ami/blob/main/templates/al2/runtime/bootstrap.sh) and [nodeadm tool for AL2023](https://awslabs.github.io/amazon-eks-ami/nodeadm/) that runs at boot time. These scripts configure your instance’s certificate data, control plane endpoint, cluster name, and more. The scripts are considered the source of truth for Amazon EKS-optimized AMI builds, so you can follow the GitHub repository to monitor changes to our AMIs.

When building custom AMIs with the EKS-optimized AMIs as the base, it is not recommended or supported to run an operating system upgrade (ie. `dnf upgrade`) or upgrade any of the Kubernetes or GPU packages that are included in the EKS-optimized AMIs, as this risks breaking component compatibility. If you do upgrade the operating system or packages that are included in the EKS-optimized AMIs, it is recommended to thoroughly test in a development or staging environment before deploying to production.

When building custom AMIs for GPU instances, it is recommended to build separate custom AMIs for each instance type generation and family that you will run. The EKS-optimized accelerated AMIs selectively install drivers and packages at runtime based on the underlying instance type generation and family. For more information, see the EKS AMI scripts for [installation](https://github.com/awslabs/amazon-eks-ami/blob/main/templates/al2023/provisioners/install-nvidia-driver.sh) and [runtime](https://github.com/awslabs/amazon-eks-ami/blob/main/templates/al2023/runtime/gpu/nvidia-kmod-load.sh).

## Prerequisites
<a name="_prerequisites"></a>
+  [Install the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) 
+  [Install HashiCorp Packer v1.9.4\$1](https://developer.hashicorp.com/packer/downloads) 
+  [Install GNU Make](https://www.gnu.org/software/make/) 

## Quickstart
<a name="_quickstart"></a>

This quickstart shows you the commands to create a custom AMI in your AWS account. To learn more about the configurations available to customize your AMI, see the template variables on the [Amazon Linux 2023](https://awslabs.github.io/amazon-eks-ami/usage/al2023/) page.

### Prerequisites
<a name="_prerequisites_2"></a>

Install the required [Amazon plugin](https://developer.hashicorp.com/packer/integrations/hashicorp/amazon). For example:

```
packer plugins install github.com/hashicorp/amazon
```

### Step 1. Setup your environment
<a name="_step_1_setup_your_environment"></a>

Clone or fork the official Amazon EKS AMI repository. For example:

```
git clone https://github.com/awslabs/amazon-eks-ami.git
cd amazon-eks-ami
```

Verify that Packer is installed:

```
packer --version
```

### Step 2. Create a custom AMI
<a name="_step_2_create_a_custom_ami"></a>

The following are example commands for various custom AMIs.

 **Basic NVIDIA AL2 AMI:** 

```
make k8s=1.31 os_distro=al2 \
  enable_accelerator=nvidia \
  nvidia_driver_major_version=560 \
  enable_efa=true
```

 **Basic NVIDIA AL2023 AMI:** 

```
make k8s=1.31 os_distro=al2023 \
  enable_accelerator=nvidia \
  nvidia_driver_major_version=560 \
  enable_efa=true
```

 **STIG-Compliant Neuron AL2023 AMI:** 

```
make k8s=1.31 os_distro=al2023 \
  enable_accelerator=neuron \
  enable_fips=true \
  source_ami_id=ami-0abcd1234efgh5678 \
  kms_key_id=alias/aws-stig
```

After you run these commands, Packer will do the following: \$1 Launch a temporary Amazon EC2 instance. \$1 Install Kubernetes components, drivers, and configurations. \$1 Create the AMI in your AWS account.

The expected output should look like this:

```
==> Wait completed after 8 minutes 42 seconds

==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs: AMIs were created:
us-west-2: ami-0e139a4b1a7a9a3e9

--> amazon-ebs: AMIs were created:
us-west-2: ami-0e139a4b1a7a9a3e9

--> amazon-ebs: AMIs were created:
us-west-2: ami-0e139a4b1a7a9a3e9
```

### Step 3. View default values
<a name="_step_3_view_default_values"></a>

To view default values and additional options, run the following command:

```
make help
```

# Create nodes with optimized Bottlerocket AMIs
<a name="eks-optimized-ami-bottlerocket"></a>

 [Bottlerocket](https://aws.amazon.com/bottlerocket/) is an open source Linux distribution that’s sponsored and supported by AWS. Bottlerocket is purpose-built for hosting container workloads. With Bottlerocket, you can improve the availability of containerized deployments and reduce operational costs by automating updates to your container infrastructure. Bottlerocket includes only the essential software to run containers, which improves resource usage, reduces security threats, and lowers management overhead. The Bottlerocket AMI includes `containerd`, `kubelet`, and AWS IAM Authenticator. In addition to managed node groups and self-managed nodes, Bottlerocket is also supported by [Karpenter](https://karpenter.sh/).

## Advantages
<a name="bottlerocket-advantages"></a>

Using Bottlerocket with your Amazon EKS cluster has the following advantages:
+  **Higher uptime with lower operational cost and lower management complexity** – Bottlerocket has a smaller resource footprint, shorter boot times, and is less vulnerable to security threats than other Linux distributions. Bottlerocket’s smaller footprint helps to reduce costs by using less storage, compute, and networking resources.
+  **Improved security from automatic OS updates** – Updates to Bottlerocket are applied as a single unit which can be rolled back, if necessary. This removes the risk of corrupted or failed updates that can leave the system in an unusable state. With Bottlerocket, security updates can be automatically applied as soon as they’re available in a minimally disruptive manner and be rolled back if failures occur.
+  **Premium support** – AWS provided builds of Bottlerocket on Amazon EC2 is covered under the same AWS Support plans that also cover AWS services such as Amazon EC2, Amazon EKS, and Amazon ECR.

## Considerations
<a name="bottlerocket-considerations"></a>

Consider the following when using Bottlerocket for your AMI type:
+ Bottlerocket supports Amazon EC2 instances with `x86_64` and `arm64` processors.
+ Bottlerocket supports Amazon EC2 instances with GPUs. For more information, see [Use EKS-optimized accelerated AMIs for GPU instances](ml-eks-optimized-ami.md).
+ Bottlerocket images don’t include an SSH server or a shell. You can employ out-of-band access methods to allow SSH. These approaches enable the admin container and to pass some bootstrapping configuration steps with user data. For more information, refer to the following sections in [Bottlerocket OS](https://github.com/bottlerocket-os/bottlerocket/blob/develop/README.md) on GitHub:
  +  [Exploration](https://github.com/bottlerocket-os/bottlerocket/blob/develop/README.md#exploration) 
  +  [Admin container](https://github.com/bottlerocket-os/bottlerocket/blob/develop/README.md#admin-container) 
  +  [Kubernetes settings](https://github.com/bottlerocket-os/bottlerocket/blob/develop/README.md#kubernetes-settings) 
+ Bottlerocket uses different container types:
  + By default, a [control container](https://github.com/bottlerocket-os/bottlerocket-control-container) is enabled. This container runs the [AWS Systems Manager agent](https://github.com/aws/amazon-ssm-agent) that you can use to run commands or start shell sessions on Amazon EC2 Bottlerocket instances. For more information, see [Setting up Session Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-getting-started.html) in the * AWS Systems Manager User Guide*.
  + If an SSH key is given when creating the node group, an admin container is enabled. We recommend using the admin container only for development and testing scenarios. We don’t recommend using it for production environments. For more information, see [Admin container](https://github.com/bottlerocket-os/bottlerocket/blob/develop/README.md#admin-container) on GitHub.

## More information
<a name="bottlerocket-more-information"></a>

For more information about using Amazon EKS optimized Bottlerocket AMIs, see the following sections:
+ For details about Bottlerocket, see the [Bottlerocket Documentation](https://bottlerocket.dev/en/).
+ For version information resources, see [Retrieve Bottlerocket AMI version information](eks-ami-versions-bottlerocket.md).
+ To use Bottlerocket with managed node groups, see [Simplify node lifecycle with managed node groups](managed-node-groups.md).
+ To launch self-managed Bottlerocket nodes, see [Create self-managed Bottlerocket nodes](launch-node-bottlerocket.md).
+ To retrieve the latest IDs of the Amazon EKS optimized Bottlerocket AMIs, see [Retrieve recommended Bottlerocket AMI IDs](retrieve-ami-id-bottlerocket.md).
+ For details on compliance support, see [Meet compliance requirements with Bottlerocket](bottlerocket-compliance-support.md).

# Retrieve Bottlerocket AMI version information
<a name="eks-ami-versions-bottlerocket"></a>

Each Bottlerocket AMI release includes various versions of [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/), the Bottlerocket kernel, and [containerd](https://containerd.io/). Accelerated AMI variants also include various versions of the NVIDIA driver. You can find this version information in the [OS](https://bottlerocket.dev/en/os/) topic of the *Bottlerocket Documentation*. From this page, navigate to the applicable *Version Information* sub-topic.

The *Bottlerocket Documentation* can sometimes lag behind the versions that are available on GitHub. You can find a list of changes for the latest versions in the [releases](https://github.com/bottlerocket-os/bottlerocket/releases) on GitHub.

# Retrieve recommended Bottlerocket AMI IDs
<a name="retrieve-ami-id-bottlerocket"></a>

When deploying nodes, you can specify an ID for a pre-built Amazon EKS optimized Amazon Machine Image (AMI). To retrieve an AMI ID that fits your desired configuration, query the AWS Systems Manager Parameter Store API. Using this API eliminates the need to manually look up Amazon EKS optimized AMI IDs. For more information, see [GetParameter](https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_GetParameter.html). The [IAM principal](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html#iam-term-principal) that you use must have the `ssm:GetParameter` IAM permission to retrieve the Amazon EKS optimized AMI metadata.

You can retrieve the image ID of the latest recommended Amazon EKS optimized Bottlerocket AMI with the following AWS CLI command, which uses the sub-parameter `image_id`. Make the following modifications to the command as needed and then run the modified command:
+ Replace *kubernetes-version* with a supported [platform-version](https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html).
+ Replace *-flavor* with one of the following options.
  + Remove *-flavor* for variants without a GPU.
  + Use *-nvidia* for GPU-enabled variants.
  + Use *-fips* for FIPS-enabled variants.
+ Replace *architecture* with one of the following options.
  + Use *x86\$164* for `x86` based instances.
  + Use *arm64* for ARM instances.
+ Replace *region-code* with an [Amazon EKS supported AWS Region](https://docs.aws.amazon.com/general/latest/gr/eks.html) for which you want the AMI ID.

```
aws ssm get-parameter --name /aws/service/bottlerocket/aws-k8s-kubernetes-version-flavor/architecture/latest/image_id \
    --region region-code --query "Parameter.Value" --output text
```

Here’s an example command after placeholder replacements have been made.

```
aws ssm get-parameter --name /aws/service/bottlerocket/aws-k8s-1.31/x86_64/latest/image_id \
    --region us-west-2 --query "Parameter.Value" --output text
```

An example output is as follows.

```
ami-1234567890abcdef0
```

# Meet compliance requirements with Bottlerocket
<a name="bottlerocket-compliance-support"></a>

Bottlerocket complies with recommendations defined by various organizations:
+ There is a [CIS Benchmark](https://www.cisecurity.org/benchmark/bottlerocket) defined for Bottlerocket. In a default configuration, Bottlerocket image has most of the controls required by CIS Level 1 configuration profile. You can implement the controls required for a CIS Level 2 configuration profile. For more information, see [Validating Amazon EKS optimized Bottlerocket AMI against the CIS Benchmark](https://aws.amazon.com/blogs/containers/validating-amazon-eks-optimized-bottlerocket-ami-against-the-cis-benchmark) on the AWS blog.
+ The optimized feature set and reduced attack surface means that Bottlerocket instances require less configuration to satisfy PCI DSS requirements. The [CIS Benchmark for Bottlerocket](https://www.cisecurity.org/benchmark/bottlerocket) is an excellent resource for hardening guidance, and supports your requirements for secure configuration standards under PCI DSS requirement 2.2. You can also leverage [Fluent Bit](https://opensearch.org/blog/technical-post/2022/07/bottlerocket-k8s-fluent-bit/) to support your requirements for operating system level audit logging under PCI DSS requirement 10.2. AWS publishes new (patched) Bottlerocket instances periodically to help you meet PCI DSS requirement 6.2 (for v3.2.1) and requirement 6.3.3 (for v4.0).
+ Bottlerocket is an HIPAA-eligible feature authorized for use with regulated workloads for both Amazon EC2 and Amazon EKS. For more information, see [HIPAA Eligible Services Reference](https://aws.amazon.com/compliance/hipaa-eligible-services-reference/).
+ Bottlerocket AMIs are available that are preconfigured to use FIPS 140-3 validated cryptographic modules. This includes the Amazon Linux 2023 Kernel Crypto API Cryptographic Module and the AWS-LC Cryptographic Module. For more information, see [Make your worker nodes FIPS ready with Bottlerocket FIPS AMIs](bottlerocket-fips-amis.md).

# Make your worker nodes FIPS ready with Bottlerocket FIPS AMIs
<a name="bottlerocket-fips-amis"></a>

The Federal Information Processing Standard (FIPS) Publication 140-3 is a United States and Canadian government standard that specifies the security requirements for cryptographic modules that protect sensitive information. Bottlerocket makes it easier to adhere to FIPS by offering AMIs with a FIPS kernel.

These AMIs are preconfigured to use FIPS 140-3 validated cryptographic modules. This includes the Amazon Linux 2023 Kernel Crypto API Cryptographic Module and the Go Cryptographic Module.

Using Bottlerocket FIPS AMIs makes your worker nodes "FIPS ready" but not automatically "FIPS-compliant". For more information, see [Federal Information Processing Standard (FIPS) 140-3](https://aws.amazon.com/compliance/fips/).

## Considerations
<a name="_considerations"></a>
+ If your cluster uses isolated subnets, the Amazon ECR FIPS endpoint may not be accessible. This can cause the node bootstrap to fail. Make sure that your network configuration allows access to the necessary FIPS endpoints. For more information, see [Access a resource through a resource VPC endpoint](https://docs.aws.amazon.com/vpc/latest/privatelink/use-resource-endpoint.html) in the * AWS PrivateLink Guide*.
+ If your cluster uses a subnet with [PrivateLink](vpc-interface-endpoints.md), image pulls will fail because Amazon ECR FIPS endpoints are not available through PrivateLink.

## Create a managed node group with a Bottlerocket FIPS AMI
<a name="_create_a_managed_node_group_with_a_bottlerocket_fips_ami"></a>

The Bottlerocket FIPS AMI comes in four variants to support your workloads:
+  `BOTTLEROCKET_x86_64_FIPS` 
+  `BOTTLEROCKET_ARM_64_FIPS` 
+  `BOTTLEROCKET_x86_64_NVIDIA_FIPS` 
+  `BOTTLEROCKET_ARM_64_NVIDIA_FIPS` 

To create a managed node group with a Bottlerocket FIPS AMI, choose the applicable AMI type during the creation process. For more information, see [Create a managed node group for your cluster](create-managed-node-group.md).

For more information on selecting FIPS-enabled variants, see [Retrieve recommended Bottlerocket AMI IDs](retrieve-ami-id-bottlerocket.md).

## Disable the FIPS endpoint for non-supported AWS Regions
<a name="disable_the_fips_endpoint_for_non_supported_shared_aws_regions"></a>

Bottlerocket FIPS AMIs are supported directly in the United States, including AWS GovCloud (US) Regions. For AWS Regions where the AMIs are available but not supported directly, you can still use the AMIs by creating a managed node group with a launch template.

The Bottlerocket FIPS AMI relies on the Amazon ECR FIPS endpoint during bootstrap, which are not generally available outside of the United States. To use the AMI for its FIPS kernel in AWS Regions that don’t have the Amazon ECR FIPS endpoint available, do these steps to disable the FIPS endpoint:

1. Create a new configuration file with the following content or incorporate the content into your existing configuration file.

```
[default]
use_fips_endpoint=false
```

1. Encode the file content as Base64 format.

1. In your launch template’s `UserData`, add the following encoded string using TOML format:

```
[settings.aws]
config = "<your-base64-encoded-string>"
```

For other settings, see Bottlerocket’s [Description of settings](https://github.com/bottlerocket-os/bottlerocket?tab=readme-ov-file#description-of-settings) on GitHub.

Here is an example of `UserData` in a launch template:

```
[settings]
motd = "Hello from eksctl!"
[settings.aws]
config = "W2RlZmF1bHRdCnVzZV9maXBzX2VuZHBvaW50PWZhbHNlCg==" # Base64-encoded string.
[settings.kubernetes]
api-server = "<api-server-endpoint>"
cluster-certificate = "<cluster-certificate-authority>"
cluster-name = "<cluster-name>"
...<other-settings>
```

For more information on creating a launch template with user data, see [Customize managed nodes with launch templates](launch-templates.md).

# Create nodes with optimized Ubuntu Linux AMIs
<a name="eks-partner-amis"></a>

Canonical has partnered with Amazon EKS to create node AMIs that you can use in your clusters.

 [Canonical](https://www.canonical.com/) delivers a built-for-purpose Kubernetes Node OS image. This minimized Ubuntu image is optimized for Amazon EKS and includes the custom AWS kernel that is jointly developed with AWS. For more information, see [Ubuntu on Amazon Elastic Kubernetes Service (EKS)](https://cloud-images.ubuntu.com/aws-eks/) and [Create self-managed Ubuntu Linux nodes](launch-node-ubuntu.md) . For information about support, see the [Third-party software](https://aws.amazon.com/premiumsupport/faqs/#Third-party_software) section of the * AWS Premium Support FAQs*.

# Create nodes with optimized Windows AMIs
<a name="eks-optimized-windows-ami"></a>

Windows Amazon EKS optimized AMIs are built on top of Windows Server 2019, Windows Server 2022, and Windows Server 2025. They are configured to serve as the base image for Amazon EKS nodes. By default, the AMIs include the following components:
+  [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) 
+  [kube-proxy](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/) 
+  [AWS IAM Authenticator for Kubernetes](https://github.com/kubernetes-sigs/aws-iam-authenticator) 
+  [csi-proxy](https://github.com/kubernetes-csi/csi-proxy) 
+  [containerd](https://containerd.io/) 

**Note**  
You can track security or privacy events for Windows Server with the [Microsoft security update guide](https://portal.msrc.microsoft.com/en-us/security-guidance).

Amazon EKS offers AMIs that are optimized for Windows containers in the following variants:
+ Amazon EKS-optimized Windows Server 2019 Core AMI
+ Amazon EKS-optimized Windows Server 2019 Full AMI
+ Amazon EKS-optimized Windows Server 2022 Core AMI
+ Amazon EKS-optimized Windows Server 2022 Full AMI
+ Amazon EKS-optimized Windows Server 2025 Core AMI
+ Amazon EKS-optimized Windows Server 2025 Full AMI

**Important**  
The Amazon EKS-optimized Windows Server 20H2 Core AMI is deprecated. No new versions of this AMI will be released.
To ensure that you have the latest security updates by default, Amazon EKS maintains optimized Windows AMIs for the last 4 months. Each new AMI will be available for 4 months from the time of initial release. After this period, older AMIs are made private and are no longer accessible. We encourage using the latest AMIs to avoid security vulnerabilities and losing access to older AMIs which have reached the end of their supported lifetime. While we can’t guarantee that we can provide access to AMIs that have been made private, you can request access by filing a ticket with AWS Support.

## Release calendar
<a name="windows-ami-release-calendar"></a>

The following table lists the release and end of support dates for Windows versions on Amazon EKS. If an end date is blank, it’s because the version is still supported.


| Windows version | Amazon EKS release | Amazon EKS end of support | 
| --- | --- | --- | 
|  Windows Server 2025 Core  |  01/27/2026  |  | 
|  Windows Server 2025 Full  |  01/27/2026  |  | 
|  Windows Server 2022 Core  |  10/17/2022  |  | 
|  Windows Server 2022 Full  |  10/17/2022  |  | 
|  Windows Server 20H2 Core  |  8/12/2021  |  8/9/2022  | 
|  Windows Server 2004 Core  |  8/19/2020  |  12/14/2021  | 
|  Windows Server 2019 Core  |  10/7/2019  |  | 
|  Windows Server 2019 Full  |  10/7/2019  |  | 
|  Windows Server 1909 Core  |  10/7/2019  |  12/8/2020  | 

## Bootstrap script configuration parameters
<a name="bootstrap-script-configuration-parameters"></a>

When you create a Windows node, there’s a script on the node that allows for configuring different parameters. Depending on your setup, this script can be found on the node at a location similar to: `C:\Program Files\Amazon\EKS\Start-EKSBootstrap.ps1`. You can specify custom parameter values by specifying them as arguments to the bootstrap script. For example, you can update the user data in the launch template. For more information, see [Amazon EC2 user data](launch-templates.md#launch-template-user-data).

The script includes the following command-line parameters:
+  `-EKSClusterName` – Specifies the Amazon EKS cluster name for this worker node to join.
+  `-KubeletExtraArgs` – Specifies extra arguments for `kubelet` (optional).
+  `-KubeProxyExtraArgs` – Specifies extra arguments for `kube-proxy` (optional).
+  `-APIServerEndpoint` – Specifies the Amazon EKS cluster API server endpoint (optional). Only valid when used with `-Base64ClusterCA`. Bypasses calling `Get-EKSCluster`.
+  `-Base64ClusterCA` – Specifies the base64 encoded cluster CA content (optional). Only valid when used with `-APIServerEndpoint`. Bypasses calling `Get-EKSCluster`.
+  `-DNSClusterIP` – Overrides the IP address to use for DNS queries within the cluster (optional). Defaults to `10.100.0.10` or `172.20.0.10` based on the IP address of the primary interface.
+  `-ServiceCIDR` – Overrides the Kubernetes service IP address range from which cluster services are addressed. Defaults to `172.20.0.0/16` or `10.100.0.0/16` based on the IP address of the primary interface.
+  `-ExcludedSnatCIDRs` – A list of `IPv4` CIDRs to exclude from Source Network Address Translation (SNAT). This means that the pod private IP which is VPC addressable wouldn’t be translated to the IP address of the instance ENI’s primary `IPv4` address for outbound traffic. By default, the `IPv4` CIDR of the VPC for the Amazon EKS Windows node is added. Specifying CIDRs to this parameter also additionally excludes the specified CIDRs. For more information, see [Enable outbound internet access for Pods](external-snat.md).

In addition to the command line parameters, you can also specify some environment variable parameters. When specifying a command line parameter, it takes precedence over the respective environment variable. The environment variable(s) should be defined as machine (or system) scoped as the bootstrap script will only read machine-scoped variables.

The script takes into account the following environment variables:
+  `SERVICE_IPV4_CIDR` – Refer to the `ServiceCIDR` command line parameter for the definition.
+  `EXCLUDED_SNAT_CIDRS` – Should be a comma separated string. Refer to the `ExcludedSnatCIDRs` command line parameter for the definition.

### gMSA authentication support
<a name="ad-and-gmsa-support"></a>

Amazon EKS Windows Pods allow different types of group Managed Service Account (gMSA) authentication.
+ Amazon EKS supports Active Directory domain identities for authentication. For more information on domain-joined gMSA, see [Windows Authentication on Amazon EKS Windowspods](https://aws.amazon.com/blogs/containers/windows-authentication-on-amazon-eks-windows-pods) on the AWS blog.
+ Amazon EKS offers a plugin that enables non-domain-joined Windows nodes to retrieve gMSA credentials with a portable user identity. For more information on domainless gMSA, see [Domainless Windows Authentication for Amazon EKS Windowspods](https://aws.amazon.com/blogs/containers/domainless-windows-authentication-for-amazon-eks-windows-pods) on the AWS blog.

## Cached container images
<a name="windows-cached-container-images"></a>

Amazon EKS Windows optimized AMIs have certain container images cached for the `containerd` runtime. Container images are cached when building custom AMIs using Amazon-managed build components. For more information, see [Using the Amazon-managed build component](eks-custom-ami-windows.md#custom-windows-ami-build-component).

The following cached container images are for the `containerd` runtime:
+  `amazonaws.com/eks/pause-windows` 
+  `mcr.microsoft.com/windows/nanoserver` 
+  `mcr.microsoft.com/windows/servercore` 

## More information
<a name="windows-more-information"></a>

For more information about using Amazon EKS optimized Windows AMIs, see the following sections:
+ For details on running workloads on Amazon EKS optimized accelerated Windows AMIs, see [Run GPU-accelerated containers (Windows on EC2 G-Series)](ml-eks-windows-optimized-ami.md).
+ To use Windows with managed node groups, see [Simplify node lifecycle with managed node groups](managed-node-groups.md).
+ To launch self-managed Windows nodes, see [Create self-managed Microsoft Windows nodes](launch-windows-workers.md).
+ For version information, see [Retrieve Windows AMI version information](eks-ami-versions-windows.md).
+ To retrieve the latest IDs of the Amazon EKS optimized Windows AMIs, see [Retrieve recommended Microsoft Windows AMI IDs](retrieve-windows-ami-id.md).
+ To use Amazon EC2 Image Builder to create custom Amazon EKS optimized Windows AMIs, see [Build a custom Windows AMI with Image Builder](eks-custom-ami-windows.md).
+ For best practices, see [Amazon EKS optimized Windows AMI management](https://aws.github.io/aws-eks-best-practices/windows/docs/ami/) in the *EKS Best Practices Guide*.

# Create self-managed Windows Server 2022 nodes with `eksctl`
<a name="self-managed-windows-server-2022"></a>

You can use the following `test-windows-2022.yaml` as reference for creating self-managed Windows Server 2022 nodes. Replace every *example value* with your own values.

**Note**  
You must use `eksctl` version [0.116.0](https://github.com/weaveworks/eksctl/releases/tag/v0.116.0) or later to run self-managed Windows Server 2022 nodes.

```
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: windows-2022-cluster
  region: region-code
  version: '1.35'

nodeGroups:
  - name: windows-ng
    instanceType: m5.2xlarge
    amiFamily: WindowsServer2022FullContainer
    volumeSize: 100
    minSize: 2
    maxSize: 3
  - name: linux-ng
    amiFamily: AmazonLinux2
    minSize: 2
    maxSize: 3
```

The node groups can then be created using the following command.

```
eksctl create cluster -f test-windows-2022.yaml
```

# Retrieve Windows AMI version information
<a name="eks-ami-versions-windows"></a>

This topic lists versions of the Amazon EKS optimized Windows AMIs and their corresponding versions of [kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/), [containerd](https://containerd.io/), and [csi-proxy](https://github.com/kubernetes-csi/csi-proxy).

The Amazon EKS optimized AMI metadata, including the AMI ID, for each variant can be retrieved programmatically. For more information, see [Retrieve recommended Microsoft Windows AMI IDs](retrieve-windows-ami-id.md).

AMIs are versioned by Kubernetes version and the release date of the AMI in the following format:

```
k8s_major_version.k8s_minor_version-release_date
```

**Note**  
Amazon EKS managed node groups support the November 2022 and later releases of the Windows AMIs.

To receive notifications of all source file changes to this specific documentation page, you can subscribe to the following URL with an RSS reader:

```
https://github.com/awsdocs/amazon-eks-user-guide/commits/mainline/latest/ug/nodes/eks-ami-versions-windows.adoc.atom
```

## Amazon EKS optimized Windows Server 2025 Core AMI
<a name="eks-ami-versions-windows-2025-core"></a>

The following tables list the current and previous versions of the Amazon EKS optimized Windows Server 2025 Core AMI.

**Example**  


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.35-2026.02.16`   |   `1.35.0`   |   `2.1.6`   |   `1.2.1`   |  | 
|   `1.35-2026-01-22`   |   `1.35.0`   |   `2.1.6`   |   `1.2.1`   |  | 

## Amazon EKS optimized Windows Server 2025 Full AMI
<a name="eks-ami-versions-windows-2025-full"></a>

The following tables list the current and previous versions of the Amazon EKS optimized Windows Server 2025 Full AMI.

**Example**  


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.35-2026.02.16`   |   `1.35.0`   |   `2.1.6`   |   `1.2.1`   |  | 
|   `1.35-2026-01-22`   |   `1.35.0`   |   `2.1.6`   |   `1.2.1`   |  | 

## Amazon EKS optimized Windows Server 2022 Core AMI
<a name="eks-ami-versions-windows-2022-core"></a>

The following tables list the current and previous versions of the Amazon EKS optimized Windows Server 2022 Core AMI.

**Example**  


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.35-2026.02.16`   |   `1.35.0`   |   `2.1.6`   |   `1.2.1`   |  | 
|   `1.35-2026-01-22`   |   `1.35.0`   |   `2.1.6`   |   `1.2.1`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.34-2026.02.13`   |   `1.34.3`   |   `2.1.6`   |   `1.2.1`   |  | 
|   `1.34-2026.01.22`   |   `1.34.2`   |   `2.1.6`   |   `1.2.1`   |  Upgraded `containerd` to `2.1.6`.  | 
|   `1.34-2025.12.15`   |   `1.34.2`   |   `2.1.4`   |   `1.2.1`   |  | 
|   `1.34-2025.11.14`   |   `1.34.1`   |   `2.1.4`   |   `1.2.1`   |  | 
|   `1.34-2025.10.18`   |   `1.34.1`   |   `2.1.4`   |   `1.2.1`   |  | 
|   `1.34-2025.09.13`   |   `1.34.0`   |   `2.1.4`   |   `1.2.1`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.33-2026.02.13`   |   `1.33.7`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.33-2026.01.22`   |   `1.33.5`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.33-2025.12.15`   |   `1.33.5`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.33-2025.11.14`   |   `1.33.5`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.33-2025.10.18`   |   `1.33.5`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.33-2025.09.13`   |   `1.33.4`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.33-2025.08.18`   |   `1.33.3`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.33-2025.07.16`   |   `1.33.1`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.33-2025.06.13`   |   `1.33.1`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.33-2025.05.17`   |   `1.33.1`   |   `1.7.27`   |   `1.2.1`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.32-2026.02.13`   |   `1.32.11`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.32-2026.01.22`   |   `1.32.9`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.32-2025.12.15`   |   `1.32.9`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.32-2025.11.14`   |   `1.32.9`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.32-2025.10.18`   |   `1.32.9`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.32-2025.09.13`   |   `1.32.8`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.32-2025.08.18`   |   `1.32.7`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.32-2025.07.16`   |   `1.32.5`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.32-2025.06.13`   |   `1.32.5`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.32-2025.05.17`   |   `1.32.5`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.32-2025.04.14`   |   `1.32.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.32-2025.03.14`   |   `1.32.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.32-2025.02.18`   |   `1.32.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.32-2025.01.15`   |   `1.32.0`   |   `1.7.20`   |   `1.1.3`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.31-2026.02.13`   |   `1.31.14`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.31-2026.01.22`   |   `1.31.13`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.31-2025.12.15`   |   `1.31.13`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.31-2025.11.14`   |   `1.31.13`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.31-2025.10.18`   |   `1.31.13`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.31-2025.09.13`   |   `1.31.12`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.31-2025.08.18`   |   `1.31.11`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.31-2025.07.16`   |   `1.31.9`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.31-2025.06.13`   |   `1.31.9`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.31-2025.05.17`   |   `1.31.9`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.31-2025.04.14`   |   `1.31.5`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.03.14`   |   `1.31.5`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.02.15`   |   `1.31.5`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.01.15`   |   `1.31.4`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.01.01`   |   `1.31.4`   |   `1.7.20`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.31-2024.12.13`   |   `1.31.3`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.11.12`   |   `1.31.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.10.08`   |   `1.31.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.10.01`   |   `1.31.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.09.10`   |   `1.31.0`   |   `1.7.20`   |   `1.1.3`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.30-2026.02.13`   |   `1.30.14`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.30-2026.01.22`   |   `1.30.14`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.30-2025.12.15`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.30-2025.11.21`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.30-2025.10.18`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.30-2025.09.13`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.30-2025.08.18`   |   `1.30.14`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.30-2025.07.16`   |   `1.30.13`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.30-2025.06.13`   |   `1.30.13`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.30-2025.05.17`   |   `1.30.13`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.30-2025.04.14`   |   `1.30.9`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.30-2025.03.14`   |   `1.30.9`   |   `1.7.20`   |   `1.1.3`   |  Upgraded `containerd` to `1.7.20`.  | 
|   `1.30-2025.02.15`   |   `1.30.9`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2025.01.15`   |   `1.30.8`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2025.01.01`   |   `1.30.8`   |   `1.7.14`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.30-2024.12.11`   |   `1.30.7`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.11.12`   |   `1.30.4`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.10.08`   |   `1.30.4`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.09.10`   |   `1.30.2`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.08.13`   |   `1.30.2`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.07.10`   |   `1.30.2`   |   `1.7.14`   |   `1.1.2`   |  Includes patches for `CVE-2024-5321`.  | 
|   `1.30-2024.06.17`   |   `1.30.0`   |   `1.7.14`   |   `1.1.2`   |  Upgraded `containerd` to `1.7.14`.  | 
|   `1.30-2024.05.15`   |   `1.30.0`   |   `1.6.28`   |   `1.1.2`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.29-2026.02.13`   |   `1.29.15`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.29-2026.01.22`   |   `1.29.15`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.29-2025.12.15`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.29-2025.11.14`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.29-2025.10.18`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.29-2025.09.13`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.29-2025.08.18`   |   `1.29.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.29-2025.07.16`   |   `1.29.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.29-2025.06.13`   |   `1.29.15`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.29-2025.05.17`   |   `1.29.15`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.29-2025.04.14`   |   `1.29.13`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.29-2025.03.14`   |   `1.29.13`   |   `1.7.20`   |   `1.1.3`   |  Upgraded `containerd` to `1.7.20`.  | 
|   `1.29-2025.02.15`   |   `1.29.13`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2025.01.15`   |   `1.29.12`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2025.01.01`   |   `1.29.12`   |   `1.7.14`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.29-2024.12.11`   |   `1.29.10`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.11.12`   |   `1.29.8`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.10.08`   |   `1.29.8`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.09.10`   |   `1.29.6`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.08.13`   |   `1.29.6`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.07.10`   |   `1.29.6`   |   `1.7.11`   |   `1.1.2`   |  Includes patches for `CVE-2024-5321`.  | 
|   `1.29-2024.06.17`   |   `1.29.3`   |   `1.7.11`   |   `1.1.2`   |  | 
|   `1.29-2024.05.15`   |   `1.29.3`   |   `1.7.11`   |   `1.1.2`   |  Upgraded `containerd` to `1.7.11`. Upgraded `kubelet` to `1.29.3`.  | 
|   `1.29-2024.04.09`   |   `1.29.0`   |   `1.6.28`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.28`. Rebuilt CNI and `csi-proxy` using `golang 1.22.1`.  | 
|   `1.29-2024.03.12`   |   `1.29.0`   |   `1.6.25`   |   `1.1.2`   |  | 
|   `1.29-2024.02.13`   |   `1.29.0`   |   `1.6.25`   |   `1.1.2`   |  | 
|   `1.29-2024.02.06`   |   `1.29.0`   |   `1.6.25`   |   `1.1.2`   |  Fixed a bug where the pause image was incorrectly deleted by `kubelet` garbage collection process.  | 
|   `1.29-2024.01.11`   |   `1.29.0`   |   `1.6.18`   |   `1.1.2`   |  Excluded Standalone Windows Update [KB5034439](https://support.microsoft.com/en-au/topic/kb5034439-windows-recovery-environment-update-for-azure-stack-hci-version-22h2-and-windows-server-2022-january-9-2024-6f9d26e6-784c-4503-a3c6-0beedda443ca) on Windows Server 2022 Core AMIs. The KB applies only to Windows installations with a separate WinRE partition, which aren’t included with any of our Amazon EKS Optimized Windows AMIs.  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.28-2025.11.14`   |   `1.28.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.28-2025.10.18`   |   `1.28.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.28-2025.09.13`   |   `1.28.15`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.28-2025.08.18`   |   `1.28.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.28-2025.07.16`   |   `1.28.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.28-2025.06.13`   |   `1.28.15`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.28-2025.05.17`   |   `1.28.15`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.28-2025.04.14`   |   `1.28.15`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.28-2025.03.14`   |   `1.28.15`   |   `1.7.20`   |   `1.1.3`   |  Upgraded `containerd` to `1.7.20`.  | 
|   `1.28-2025.02.15`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2025.01.15`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2025-01-01`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.28-2024.12.11`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.11.12`   |   `1.28.13`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.10.08`   |   `1.28.13`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.09.10`   |   `1.28.11`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.08.13`   |   `1.28.11`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.07.10`   |   `1.28.11`   |   `1.7.11`   |   `1.1.2`   |  Includes patches for `CVE-2024-5321`.  | 
|   `1.28-2024.06.17`   |   `1.28.8`   |   `1.7.11`   |   `1.1.2`   |  Upgraded `containerd` to `1.7.11`.  | 
|   `1.28-2024.05.14`   |   `1.28.8`   |   `1.6.28`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.28`. Upgraded `kubelet` to `1.28.8`.  | 
|   `1.28-2024.04.09`   |   `1.28.5`   |   `1.6.25`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.25`. Rebuilt CNI and `csi-proxy` using `golang 1.22.1`.  | 
|   `1.28-2024.03.12`   |   `1.28.5`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2024.02.13`   |   `1.28.5`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2024.01.11`   |   `1.28.5`   |   `1.6.18`   |   `1.1.2`   |  Excluded Standalone Windows Update [KB5034439](https://support.microsoft.com/en-au/topic/kb5034439-windows-recovery-environment-update-for-azure-stack-hci-version-22h2-and-windows-server-2022-january-9-2024-6f9d26e6-784c-4503-a3c6-0beedda443ca) on Windows Server 2022 Core AMIs. The KB applies only to Windows installations with a separate WinRE partition, which aren’t included with any of our Amazon EKS Optimized Windows AMIs.  | 
|   `1.28-2023.12.12`   |   `1.28.3`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2023.11.14`   |   `1.28.3`   |   `1.6.18`   |   `1.1.2`   |  Includes patches for `CVE-2023-5528`.  | 
|   `1.28-2023.10.19`   |   `1.28.2`   |   `1.6.18`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.18`. Added new [bootstrap script environment variables](eks-optimized-windows-ami.md#bootstrap-script-configuration-parameters) (`SERVICE_IPV4_CIDR` and `EXCLUDED_SNAT_CIDRS`).  | 
|   `1.28-2023-09.27`   |   `1.28.2`   |   `1.6.6`   |   `1.1.2`   |  Fixed a [security advisory](https://github.com/advisories/GHSA-6xv5-86q9-7xr8) in `kubelet`.  | 
|   `1.28-2023.09.12`   |   `1.28.1`   |   `1.6.6`   |   `1.1.2`   |  | 

## Amazon EKS optimized Windows Server 2022 Full AMI
<a name="eks-ami-versions-windows-2022-full"></a>

The following tables list the current and previous versions of the Amazon EKS optimized Windows Server 2022 Full AMI.

**Example**  


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.35-2026.02.16`   |   `1.35.0`   |   `2.1.6`   |   `1.2.1`   |  | 
|   `1.35-2026-01-22`   |   `1.35.0`   |   `2.1.6`   |   `1.2.1`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.34-2026.02.13`   |   `1.34.3`   |   `2.1.6`   |   `1.2.1`   |  | 
|   `1.34-2026.01.22`   |   `1.34.2`   |   `2.1.6`   |   `1.2.1`   |  Upgraded `containerd` to `2.1.6`.  | 
|   `1.34-2025.12.15`   |   `1.34.2`   |   `2.1.4`   |   `1.2.1`   |  | 
|   `1.34-2025.11.14`   |   `1.34.1`   |   `2.1.4`   |   `1.2.1`   |  | 
|   `1.34-2025.10.18`   |   `1.34.1`   |   `2.1.4`   |   `1.2.1`   |  | 
|   `1.34-2025.09.13`   |   `1.34.0`   |   `2.1.4`   |   `1.2.1`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.33-2026.02.13`   |   `1.33.7`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.33-2026.01.22`   |   `1.33.5`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.33-2025.12.15`   |   `1.33.5`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.33-2025.11.14`   |   `1.33.5`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.33-2025.10.18`   |   `1.33.5`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.33-2025.09.13`   |   `1.33.4`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.33-2025.08.18`   |   `1.33.3`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.33-2025.07.16`   |   `1.33.1`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.33-2025.06.13`   |   `1.33.1`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.33-2025.05.17`   |   `1.33.1`   |   `1.7.27`   |   `1.2.1`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.32-2026.02.13`   |   `1.32.11`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.32-2026.01.22`   |   `1.32.9`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.32-2025.12.15`   |   `1.32.9`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.32-2025.11.14`   |   `1.32.9`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.32-2025.10.18`   |   `1.32.9`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.32-2025.09.13`   |   `1.32.8`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.32-2025.08.18`   |   `1.32.7`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.32-2025.07.16`   |   `1.32.5`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.32-2025.06.13`   |   `1.32.5`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`   | 
|   `1.32-2025.05.17`   |   `1.32.5`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.32-2025.04.14`   |   `1.32.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.32-2025.03.14`   |   `1.32.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.32-2025.02.18`   |   `1.32.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.32-2025.01.01`   |   `1.32.0`   |   `1.7.20`   |   `1.1.3`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.31-2026.02.13`   |   `1.31.14`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.31-2026.01.22`   |   `1.31.13`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.31-2025.12.15`   |   `1.31.13`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.31-2025.11.14`   |   `1.31.13`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.31-2025.10.18`   |   `1.31.13`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.31-2025.09.13`   |   `1.31.12`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.31-2025.08.18`   |   `1.31.11`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.31-2025.07.16`   |   `1.31.9`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.31-2025.06.13`   |   `1.31.9`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.31-2025.05.17`   |   `1.31.9`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.31-2025.04.14`   |   `1.31.5`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.03.14`   |   `1.31.5`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.02.15`   |   `1.31.5`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.01.15`   |   `1.31.4`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.01.01`   |   `1.31.4`   |   `1.7.20`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.31-2024.12.13`   |   `1.31.3`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.11.12`   |   `1.31.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.10.08`   |   `1.31.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.10.01`   |   `1.31.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.09.10`   |   `1.31.0`   |   `1.7.20`   |   `1.1.3`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.30-2026.02.13`   |   `1.30.14`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.30-2026.01.22`   |   `1.30.14`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.30-2025.12.15`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.30-2025.11.21`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.30-2025.10.18`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.30-2025.09.13`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.30-2025.08.18`   |   `1.30.14`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.30-2025.07.16`   |   `1.30.13`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.30-2025.06.13`   |   `1.30.13`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.30-2025.05.17`   |   `1.30.13`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.30-2025.04.14`   |   `1.30.9`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.30-2025.03.14`   |   `1.30.9`   |   `1.7.20`   |   `1.1.3`   |  Upgraded `containerd` to `1.7.20`.  | 
|   `1.30-2025.02.15`   |   `1.30.9`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2025.01.15`   |   `1.30.8`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2025.01.01`   |   `1.30.8`   |   `1.7.14`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.30-2024.12.11`   |   `1.30.7`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.11.12`   |   `1.30.4`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.10.08`   |   `1.30.4`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.09.10`   |   `1.30.2`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.08.13`   |   `1.30.2`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.07.10`   |   `1.30.2`   |   `1.7.14`   |   `1.1.2`   |  Includes patches for `CVE-2024-5321`.  | 
|   `1.30-2024.06.17`   |   `1.30.0`   |   `1.7.14`   |   `1.1.2`   |  Upgraded `containerd` to `1.7.14`.  | 
|   `1.30-2024.05.15`   |   `1.30.0`   |   `1.6.28`   |   `1.1.2`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.29-2026.02.13`   |   `1.29.15`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.29-2026.01.22`   |   `1.29.15`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.29-2025.12.15`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.29-2025.11.14`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.29-2025.10.18`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.29-2025.09.13`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.29-2025.08.18`   |   `1.29.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.29-2025.07.16`   |   `1.29.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.29-2025.06.13`   |   `1.29.15`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.29-2025.05.17`   |   `1.29.15`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.29-2025.04.14`   |   `1.29.13`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.29-2025.03.14`   |   `1.29.13`   |   `1.7.20`   |   `1.1.3`   |  Upgraded `containerd` to `1.7.20`.  | 
|   `1.29-2025.02.15`   |   `1.29.13`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2025.01.15`   |   `1.29.12`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2025.01.01`   |   `1.29.12`   |   `1.7.14`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.29-2024.12.11`   |   `1.29.10`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.11.12`   |   `1.29.8`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.10.08`   |   `1.29.8`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.09.10`   |   `1.29.6`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.08.13`   |   `1.29.6`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.07.10`   |   `1.29.6`   |   `1.7.11`   |   `1.1.2`   |  Includes patches for `CVE-2024-5321`.  | 
|   `1.29-2024.06.17`   |   `1.29.3`   |   `1.7.11`   |   `1.1.2`   |  | 
|   `1.29-2024.05.15`   |   `1.29.3`   |   `1.7.11`   |   `1.1.2`   |  Upgraded `containerd` to `1.7.11`. Upgraded `kubelet` to `1.29.3`.  | 
|   `1.29-2024.04.09`   |   `1.29.0`   |   `1.6.28`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.28`. Rebuilt CNI and `csi-proxy` using `golang 1.22.1`.  | 
|   `1.29-2024.03.12`   |   `1.29.0`   |   `1.6.25`   |   `1.1.2`   |  | 
|   `1.29-2024.02.13`   |   `1.29.0`   |   `1.6.25`   |   `1.1.2`   |  | 
|   `1.29-2024.02.06`   |   `1.29.0`   |   `1.6.25`   |   `1.1.2`   |  Fixed a bug where the pause image was incorrectly deleted by `kubelet` garbage collection process.  | 
|   `1.29-2024.01.09`   |   `1.29.0`   |   `1.6.18`   |   `1.1.2`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.28-2025.11.14`   |   `1.28.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.28-2025.10.18`   |   `1.28.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.28-2025.09.13`   |   `1.28.15`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.28-2025.08.18`   |   `1.28.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.28-2025.07.16`   |   `1.28.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.28-2025.06.13`   |   `1.28.15`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.28-2025.05.17`   |   `1.28.15`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.28-2025.04.14`   |   `1.28.15`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.28-2025.03.14`   |   `1.28.15`   |   `1.7.20`   |   `1.1.3`   |  Upgraded `containerd` to `1.7.20`.  | 
|   `1.28-2025.02.15`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2025.01.15`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2025.01.01`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.28-2024.12.11`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.11.12`   |   `1.28.13`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.10.08`   |   `1.28.13`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.09.10`   |   `1.28.11`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.08.13`   |   `1.28.11`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.07.10`   |   `1.28.11`   |   `1.7.11`   |   `1.1.2`   |  Includes patches for `CVE-2024-5321`.  | 
|   `1.28-2024.06.17`   |   `1.28.8`   |   `1.7.11`   |   `1.1.2`   |  Upgraded `containerd` to `1.7.11`.  | 
|   `1.28-2024.05.14`   |   `1.28.8`   |   `1.6.28`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.28`. Upgraded `kubelet` to `1.28.8`.  | 
|   `1.28-2024.04.09`   |   `1.28.5`   |   `1.6.25`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.25`. Rebuilt CNI and `csi-proxy` using `golang 1.22.1`.  | 
|   `1.28-2024.03.12`   |   `1.28.5`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2024.02.13`   |   `1.28.5`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2024.01.09`   |   `1.28.5`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2023.12.12`   |   `1.28.3`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2023.11.14`   |   `1.28.3`   |   `1.6.18`   |   `1.1.2`   |  Includes patches for `CVE-2023-5528`.  | 
|   `1.28-2023.10.19`   |   `1.28.2`   |   `1.6.18`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.18`. Added new [bootstrap script environment variables](eks-optimized-windows-ami.md#bootstrap-script-configuration-parameters) (`SERVICE_IPV4_CIDR` and `EXCLUDED_SNAT_CIDRS`).  | 
|   `1.28-2023-09.27`   |   `1.28.2`   |   `1.6.6`   |   `1.1.2`   |  Fixed a [security advisory](https://github.com/advisories/GHSA-6xv5-86q9-7xr8) in `kubelet`.  | 
|   `1.28-2023.09.12`   |   `1.28.1`   |   `1.6.6`   |   `1.1.2`   |  | 

## Amazon EKS optimized Windows Server 2019 Core AMI
<a name="eks-ami-versions-windows-2019-core"></a>

The following tables list the current and previous versions of the Amazon EKS optimized Windows Server 2019 Core AMI.

**Example**  


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.35-2026.02.16`   |   `1.35.0`   |   `2.1.6`   |   `1.2.1`   |  | 
|   `1.35-2026-01-22`   |   `1.35.0`   |   `2.1.6`   |   `1.2.1`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.34-2026.02.13`   |   `1.34.3`   |   `2.1.6`   |   `1.2.1`   |  | 
|   `1.34-2026.01.22`   |   `1.34.2`   |   `2.1.6`   |   `1.2.1`   |  Upgraded `containerd` to `2.1.6`.  | 
|   `1.34-2025.12.15`   |   `1.34.2`   |   `2.1.4`   |   `1.2.1`   |  | 
|   `1.34-2025.11.14`   |   `1.34.1`   |   `2.1.4`   |   `1.2.1`   |  | 
|   `1.34-2025.10.18`   |   `1.34.1`   |   `2.1.4`   |   `1.2.1`   |  | 
|   `1.34-2025.09.13`   |   `1.34.0`   |   `2.1.4`   |   `1.2.1`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.33-2026.02.13`   |   `1.33.7`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.33-2026.01.22`   |   `1.33.5`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.33-2025.12.15`   |   `1.33.5`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.33-2025.11.14`   |   `1.33.5`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.33-2025.10.18`   |   `1.33.5`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.33-2025.09.13`   |   `1.33.4`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.33-2025.08.18`   |   `1.33.3`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.33-2025.07.16`   |   `1.33.1`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.33-2025.06.13`   |   `1.33.1`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.33-2025.05.17`   |   `1.33.1`   |   `1.7.27`   |   `1.2.1`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.32-2026.02.13`   |   `1.32.11`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.32-2026.01.22`   |   `1.32.9`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.32-2025.12.15`   |   `1.32.9`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.32-2025.11.14`   |   `1.32.9`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.32-2025.10.18`   |   `1.32.9`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.32-2025.09.13`   |   `1.32.8`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.32-2025.08.18`   |   `1.32.7`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.32-2025.07.16`   |   `1.32.5`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.32-2025.06.13`   |   `1.32.5`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.32-2025.05.17`   |   `1.32.5`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.32-2025.04.14`   |   `1.32.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.32-2025.03.14`   |   `1.32.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.32-2025.02.18`   |   `1.32.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.32-2025.01.15`   |   `1.32.4`   |   `1.7.20`   |   `1.1.3`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.31-2026.02.13`   |   `1.31.14`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.31-2026.01.22`   |   `1.31.13`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.31-2025.12.15`   |   `1.31.13`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.31-2025.11.14`   |   `1.31.13`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.31-2025.10.18`   |   `1.31.13`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.31-2025.09.13`   |   `1.31.12`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.31-2025.08.18`   |   `1.31.11`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.31-2025.07.16`   |   `1.31.9`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.31-2025.06.13`   |   `1.31.9`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.31-2025.05.17`   |   `1.31.9`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.31-2025.04.14`   |   `1.31.5`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.03.14`   |   `1.31.5`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.02.15`   |   `1.31.5`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.01.15`   |   `1.31.4`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.01.01`   |   `1.31.4`   |   `1.7.20`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.31-2024.12.13`   |   `1.31.3`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.11.12`   |   `1.31.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.10.08`   |   `1.31.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.10.01`   |   `1.31.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.09.10`   |   `1.31.0`   |   `1.7.20`   |   `1.1.3`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.30-2026.02.13`   |   `1.30.14`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.30-2026.01.22`   |   `1.30.14`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.30-2025.12.15`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.30-2025.11.21`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.30-2025.10.18`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.30-2025.09.13`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.30-2025.08.18`   |   `1.30.14`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.30-2025.07.16`   |   `1.30.13`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.30-2025.06.13`   |   `1.30.13`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.30-2025.05.17`   |   `1.30.13`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.30-2025.04.14`   |   `1.30.9`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.30-2025.03.14`   |   `1.30.9`   |   `1.7.20`   |   `1.1.3`   |  Upgraded `containerd` to `1.7.20`.  | 
|   `1.30-2025-02-15`   |   `1.30.9`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2025.01.15`   |   `1.30.8`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2025.01.01`   |   `1.30.8`   |   `1.7.14`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.30-2024.12.11`   |   `1.30.7`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.11.12`   |   `1.30.4`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.10.08`   |   `1.30.4`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.09.10`   |   `1.30.2`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.08.13`   |   `1.30.2`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.07.10`   |   `1.30.2`   |   `1.7.14`   |   `1.1.2`   |  Includes patches for `CVE-2024-5321`.  | 
|   `1.30-2024.06.17`   |   `1.30.0`   |   `1.7.14`   |   `1.1.2`   |  Upgraded `containerd` to `1.7.14`.  | 
|   `1.30-2024.05.15`   |   `1.30.0`   |   `1.6.28`   |   `1.1.2`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.29-2026.02.13`   |   `1.29.15`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.29-2026.01.22`   |   `1.29.15`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.29-2025.12.15`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.29-2025.11.14`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.29-2025.10.18`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.29-2025.09.13`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.29-2025.08.18`   |   `1.29.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.29-2025.07.16`   |   `1.29.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.29-2025.06.13`   |   `1.29.15`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.29-2025.05.17`   |   `1.29.15`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.29-2025.04.14`   |   `1.29.13`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.29-2025.03.14`   |   `1.29.13`   |   `1.7.20`   |   `1.1.3`   |  Upgraded `containerd` to `1.7.20`.  | 
|   `1.29-2025.02.15`   |   `1.29.13`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2025.01.15`   |   `1.29.12`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2025.01.01`   |   `1.29.12`   |   `1.7.14`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.29-2024.12.11`   |   `1.29.10`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.11.12`   |   `1.29.8`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.10.08`   |   `1.29.8`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.09.10`   |   `1.29.6`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.08.13`   |   `1.29.6`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.07.10`   |   `1.29.6`   |   `1.7.11`   |   `1.1.2`   |  Includes patches for `CVE-2024-5321`.  | 
|   `1.29-2024.06.17`   |   `1.29.3`   |   `1.7.11`   |   `1.1.2`   |  | 
|   `1.29-2024.05.15`   |   `1.29.3`   |   `1.7.11`   |   `1.1.2`   |  Upgraded `containerd` to `1.7.11`. Upgraded `kubelet` to `1.29.3`.  | 
|   `1.29-2024.04.09`   |   `1.29.0`   |   `1.6.28`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.28`. Rebuilt CNI and `csi-proxy` using `golang 1.22.1`.  | 
|   `1.29-2024.03.13`   |   `1.29.0`   |   `1.6.25`   |   `1.1.2`   |  | 
|   `1.29-2024.02.13`   |   `1.29.0`   |   `1.6.25`   |   `1.1.2`   |  | 
|   `1.29-2024.02.06`   |   `1.29.0`   |   `1.6.25`   |   `1.1.2`   |  Fixed a bug where the pause image was incorrectly deleted by `kubelet` garbage collection process.  | 
|   `1.29-2024.01.09`   |   `1.29.0`   |   `1.6.18`   |   `1.1.2`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.28-2025.11.14`   |   `1.28.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.28-2025.10.18`   |   `1.28.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.28-2025.09.13`   |   `1.28.15`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.28-2025.08.18`   |   `1.28.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.28-2025.07.16`   |   `1.28.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.28-2025.06.13`   |   `1.28.15`   |   `1.7.20`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.28-2025.05.17`   |   `1.28.15`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.28-2025.04.14`   |   `1.28.15`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.28-2025.03.14`   |   `1.28.15`   |   `1.7.20`   |   `1.1.3`   |  Upgraded `containerd` to `1.7.20`.  | 
|   `1.28-2025.02.15`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2025-01-15`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2025-01-01`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.28-2024.12.11`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.11.12`   |   `1.28.13`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.10.08`   |   `1.28.13`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.09.10`   |   `1.28.11`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.08.13`   |   `1.28.11`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.07.10`   |   `1.28.11`   |   `1.7.11`   |   `1.1.2`   |  Includes patches for `CVE-2024-5321`.  | 
|   `1.28-2024.06.17`   |   `1.28.8`   |   `1.7.11`   |   `1.1.2`   |  Upgraded `containerd` to `1.7.11`.  | 
|   `1.28-2024.05.14`   |   `1.28.8`   |   `1.6.28`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.28`. Upgraded `kubelet` to `1.28.8`.  | 
|   `1.28-2024.04.09`   |   `1.28.5`   |   `1.6.25`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.25`. Rebuilt CNI and `csi-proxy` using `golang 1.22.1`.  | 
|   `1.28-2024.03.13`   |   `1.28.5`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2024.02.13`   |   `1.28.5`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2024.01.09`   |   `1.28.5`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2023.12.12`   |   `1.28.3`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2023.11.14`   |   `1.28.3`   |   `1.6.18`   |   `1.1.2`   |  Includes patches for `CVE-2023-5528`.  | 
|   `1.28-2023.10.19`   |   `1.28.2`   |   `1.6.18`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.18`. Added new [bootstrap script environment variables](eks-optimized-windows-ami.md#bootstrap-script-configuration-parameters) (`SERVICE_IPV4_CIDR` and `EXCLUDED_SNAT_CIDRS`).  | 
|   `1.28-2023-09.27`   |   `1.28.2`   |   `1.6.6`   |   `1.1.2`   |  Fixed a [security advisory](https://github.com/advisories/GHSA-6xv5-86q9-7xr8) in `kubelet`.  | 
|   `1.28-2023.09.12`   |   `1.28.1`   |   `1.6.6`   |   `1.1.2`   |  | 

## Amazon EKS optimized Windows Server 2019 Full AMI
<a name="eks-ami-versions-windows-2019-full"></a>

The following tables list the current and previous versions of the Amazon EKS optimized Windows Server 2019 Full AMI.

**Example**  


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.35-2026.02.16`   |   `1.35.0`   |   `2.1.6`   |   `1.2.1`   |  | 
|   `1.35-2026-01-22`   |   `1.35.0`   |   `2.1.6`   |   `1.2.1`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.34-2026.02.13`   |   `1.34.3`   |   `2.1.6`   |   `1.2.1`   |  | 
|   `1.34-2026.01.22`   |   `1.34.2`   |   `2.1.6`   |   `1.2.1`   |  Upgraded `containerd` to `2.1.6`.  | 
|   `1.34-2025.12.15`   |   `1.34.2`   |   `2.1.4`   |   `1.2.1`   |  | 
|   `1.34-2025.11.14`   |   `1.34.1`   |   `2.1.4`   |   `1.2.1`   |  | 
|   `1.34-2025.10.18`   |   `1.34.1`   |   `2.1.4`   |   `1.2.1`   |  | 
|   `1.34-2025.09.13`   |   `1.34.0`   |   `2.1.4`   |   `1.2.1`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.33-2026.02.13`   |   `1.33.7`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.33-2026.01.22`   |   `1.33.5`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.33-2025.12.15`   |   `1.33.5`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.33-2025.11.14`   |   `1.33.5`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.33-2025.10.18`   |   `1.33.5`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.33-2025.09.13`   |   `1.33.4`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.33-2025.08.18`   |   `1.33.3`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.33-2025.07.16`   |   `1.33.1`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.33-2025.06.13`   |   `1.33.1`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.33-2025.05.17`   |   `1.33.1`   |   `1.7.27`   |   `1.2.1`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.32-2026.02.13`   |   `1.32.11`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.32-2026.01.22`   |   `1.32.9`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.32-2025.12.15`   |   `1.32.9`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.32-2025.11.14`   |   `1.32.9`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.32-2025.10.18`   |   `1.32.9`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.32-2025.09.13`   |   `1.32.8`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.32-2025.08.18`   |   `1.32.7`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.32-2025.07.16`   |   `1.32.5`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.32-2025.06.13`   |   `1.32.5`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.32-2025.05.17`   |   `1.32.5`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.32-2025.04.14`   |   `1.32.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.32-2025.03.14`   |   `1.32.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.32-2025.02.18`   |   `1.32.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.32-2025.01.15`   |   `1.32.0`   |   `1.7.20`   |   `1.1.3`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.31-2026.02.13`   |   `1.31.14`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.31-2026.01.22`   |   `1.31.13`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.31-2025.12.15`   |   `1.31.13`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.31-2025.11.14`   |   `1.31.13`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.31-2025.10.18`   |   `1.31.13`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.31-2025.09.13`   |   `1.31.12`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.31-2025.08.18`   |   `1.31.11`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.31-2025.07.16`   |   `1.31.9`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.31-2025.06.13`   |   `1.31.9`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.31-2025.05.17`   |   `1.31.9`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.31-2025.04.14`   |   `1.31.5`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.03.14`   |   `1.31.5`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.02.15`   |   `1.31.5`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.01.15`   |   `1.31.4`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2025.01.01`   |   `1.31.4`   |   `1.7.20`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.31-2024.12.13`   |   `1.31.3`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.11.12`   |   `1.31.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.10.08`   |   `1.31.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.10.01`   |   `1.31.1`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.31-2024.09.10`   |   `1.31.0`   |   `1.7.20`   |   `1.1.3`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.30-2026.02.13`   |   `1.30.14`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.30-2026.01.22`   |   `1.30.14`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.30-2025.12.15`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.30-2025.11.21`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.30-2025.10.18`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.30-2025.09.13`   |   `1.30.14`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.30-2025.08.18`   |   `1.30.14`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.30-2025.07.16`   |   `1.30.13`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.30-2025.06.13`   |   `1.30.13`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.30-2025.05.17`   |   `1.30.13`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.30-2025.04.14`   |   `1.30.9`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.30-2025.03.14`   |   `1.30.9`   |   `1.7.20`   |   `1.1.3`   |  Upgraded `containerd` to `1.7.20`.  | 
|   `1.30-2025.02.15`   |   `1.30.9`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2025.01.15`   |   `1.30.8`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2025.01.01`   |   `1.30.8`   |   `1.7.14`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.30-2024.12.11`   |   `1.30.7`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.11.12`   |   `1.30.4`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.10.08`   |   `1.30.4`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.09.10`   |   `1.30.2`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.08.13`   |   `1.30.2`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.30-2024.07.10`   |   `1.30.2`   |   `1.7.14`   |   `1.1.2`   |  Includes patches for `CVE-2024-5321`.  | 
|   `1.30-2024.06.17`   |   `1.30.0`   |   `1.7.14`   |   `1.1.2`   |  Upgraded `containerd` to `1.7.14`.  | 
|   `1.30-2024.05.15`   |   `1.30.0`   |   `1.6.28`   |   `1.1.2`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.29-2026.02.13`   |   `1.29.15`   |   `1.7.30`   |   `1.2.1`   |  | 
|   `1.29-2026.01.22`   |   `1.29.15`   |   `1.7.30`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.30`.  | 
|   `1.29-2025.12.15`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.29-2025.11.14`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.29-2025.10.18`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.29-2025.09.13`   |   `1.29.15`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.29-2025.08.18`   |   `1.29.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.29-2025.07.16`   |   `1.29.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.29-2025.06.13`   |   `1.29.15`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.29-2025.05.17`   |   `1.29.15`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.29-2025.04.14`   |   `1.29.13`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.29-2025.03.14`   |   `1.29.13`   |   `1.7.20`   |   `1.1.3`   |  Upgraded `containerd` to `1.7.20`.  | 
|   `1.29-2025.02.15`   |   `1.29.13`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2025.01.15`   |   `1.29.12`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2025.01.01`   |   `1.29.12`   |   `1.7.14`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.29-2024.12.11`   |   `1.29.10`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.11.12`   |   `1.29.8`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.10.08`   |   `1.29.8`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.09.10`   |   `1.29.6`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.08.13`   |   `1.29.6`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.29-2024.07.10`   |   `1.29.6`   |   `1.7.11`   |   `1.1.2`   |  Includes patches for `CVE-2024-5321`.  | 
|   `1.29-2024.06.17`   |   `1.29.3`   |   `1.7.11`   |   `1.1.2`   |  | 
|   `1.29-2024.05.15`   |   `1.29.3`   |   `1.7.11`   |   `1.1.2`   |  Upgraded `containerd` to `1.7.11`. Upgraded `kubelet` to `1.29.3`.  | 
|   `1.29-2024.04.09`   |   `1.29.0`   |   `1.6.28`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.28`. Rebuilt CNI and `csi-proxy` using `golang 1.22.1`.  | 
|   `1.29-2024.03.13`   |   `1.29.0`   |   `1.6.25`   |   `1.1.2`   |  | 
|   `1.29-2024.02.13`   |   `1.29.0`   |   `1.6.25`   |   `1.1.2`   |  | 
|   `1.29-2024.02.06`   |   `1.29.0`   |   `1.6.25`   |   `1.1.2`   |  Fixed a bug where the pause image was incorrectly deleted by `kubelet` garbage collection process.  | 
|   `1.29-2024.01.09`   |   `1.29.0`   |   `1.6.18`   |   `1.1.2`   |  | 


| AMI version | kubelet version | containerd version | csi-proxy version | Release notes | 
| --- | --- | --- | --- | --- | 
|   `1.28-2025.11.14`   |   `1.28.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.28-2025.10.18`   |   `1.28.15`   |   `1.7.28`   |   `1.2.1`   |  | 
|   `1.28-2025.09.13`   |   `1.28.15`   |   `1.7.28`   |   `1.2.1`   |  Changed GMSA plugin logs to Windows Events  | 
|   `1.28-2025.08.18`   |   `1.28.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.28-2025.07.16`   |   `1.28.15`   |   `1.7.27`   |   `1.2.1`   |  | 
|   `1.28-2025.06.13`   |   `1.28.15`   |   `1.7.27`   |   `1.2.1`   |  Upgraded `containerd` to `1.7.27`.  | 
|   `1.28-2025.05.17`   |   `1.28.15`   |   `1.7.20`   |   `1.2.1`   |  | 
|   `1.28-2025.04.14`   |   `1.28.15`   |   `1.7.20`   |   `1.1.3`   |  | 
|   `1.28-2025.03.14`   |   `1.28.15`   |   `1.7.20`   |   `1.1.3`   |  Upgraded `containerd` to `1.7.20`.  | 
|   `1.28-2025.02.15`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2025-01-15`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2025-01-01`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  Includes patches for `CVE-2024-9042`.  | 
|   `1.28-2024.12.11`   |   `1.28.15`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.11.12`   |   `1.28.13`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.10.08`   |   `1.28.13`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.09.10`   |   `1.28.11`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.08.13`   |   `1.28.11`   |   `1.7.14`   |   `1.1.3`   |  | 
|   `1.28-2024.07.10`   |   `1.28.11`   |   `1.7.11`   |   `1.1.2`   |  Includes patches for `CVE-2024-5321`.  | 
|   `1.28-2024.06.17`   |   `1.28.8`   |   `1.7.11`   |   `1.1.2`   |  Upgraded `containerd` to `1.7.11`.  | 
|   `1.28-2024.05.14`   |   `1.28.8`   |   `1.6.28`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.28`. Upgraded `kubelet` to `1.28.8`.  | 
|   `1.28-2024.04.09`   |   `1.28.5`   |   `1.6.25`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.25`. Rebuilt CNI and `csi-proxy` using `golang 1.22.1`.  | 
|   `1.28-2024.03.13`   |   `1.28.5`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2024.02.13`   |   `1.28.5`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2024.01.09`   |   `1.28.5`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2023.12.12`   |   `1.28.3`   |   `1.6.18`   |   `1.1.2`   |  | 
|   `1.28-2023.11.14`   |   `1.28.3`   |   `1.6.18`   |   `1.1.2`   |  Includes patches for `CVE-2023-5528`.  | 
|   `1.28-2023.10.19`   |   `1.28.2`   |   `1.6.18`   |   `1.1.2`   |  Upgraded `containerd` to `1.6.18`. Added new [bootstrap script environment variables](eks-optimized-windows-ami.md#bootstrap-script-configuration-parameters) (`SERVICE_IPV4_CIDR` and `EXCLUDED_SNAT_CIDRS`).  | 
|   `1.28-2023-09.27`   |   `1.28.2`   |   `1.6.6`   |   `1.1.2`   |  Fixed a [security advisory](https://github.com/advisories/GHSA-6xv5-86q9-7xr8) in `kubelet`.  | 
|   `1.28-2023.09.12`   |   `1.28.1`   |   `1.6.6`   |   `1.1.2`   |  | 

# Retrieve recommended Microsoft Windows AMI IDs
<a name="retrieve-windows-ami-id"></a>

When deploying nodes, you can specify an ID for a pre-built Amazon EKS optimized Amazon Machine Image (AMI). To retrieve an AMI ID that fits your desired configuration, query the AWS Systems Manager Parameter Store API. Using this API eliminates the need to manually look up Amazon EKS optimized AMI IDs. For more information, see [GetParameter](https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_GetParameter.html). The [IAM principal](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html#iam-term-principal) that you use must have the `ssm:GetParameter` IAM permission to retrieve the Amazon EKS optimized AMI metadata.

You can retrieve the image ID of the latest recommended Amazon EKS optimized Windows AMI with the following command, which uses the sub-parameter `image_id`. Make the following modifications to the command as needed and then run the modified command:
+ Replace *release* with one of the following options.
  + Use *2025* for Windows Server 2025.
  + Use *2022* for Windows Server 2022.
  + Use *2019* for Windows Server 2019.
+ Replace *installation-option* with one of the following options. For more information, see [What is the Server Core installation option in Windows Server](https://learn.microsoft.com/en-us/windows-server/administration/server-core/what-is-server-core).
  + Use *Core* for a minimal installation with a smaller attack surface.
  + Use *Full* to include the Windows desktop experience.
+ Replace *kubernetes-version* with a supported [platform-version](https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html).
+ Replace *region-code* with an [Amazon EKS supported AWS Region](https://docs.aws.amazon.com/general/latest/gr/eks.html) for which you want the AMI ID.

```
aws ssm get-parameter --name /aws/service/ami-windows-latest/Windows_Server-release-English-installation-option-EKS_Optimized-kubernetes-version/image_id \
    --region region-code --query "Parameter.Value" --output text
```

Here’s an example command after placeholder replacements have been made.

```
aws ssm get-parameter --name /aws/service/ami-windows-latest/Windows_Server-2022-English-Core-EKS_Optimized-k8s-n-2/image_id \
    --region us-west-2 --query "Parameter.Value" --output text
```

An example output is as follows.

```
ami-1234567890abcdef0
```

# Build a custom Windows AMI with Image Builder
<a name="eks-custom-ami-windows"></a>

You can use EC2 Image Builder to create custom Amazon EKS optimized Windows AMIs with one of the following options:
+  [Using an Amazon EKS optimized Windows AMI as a base](#custom-windows-ami-as-base) 
+  [Using the Amazon-managed build component](#custom-windows-ami-build-component) 

With both methods, you must create your own Image Builder recipe. For more information, see [Create a new version of an image recipe](https://docs.aws.amazon.com/imagebuilder/latest/userguide/create-image-recipes.html) in the Image Builder User Guide.

**Important**  
The following **Amazon-managed** components for `eks` include patches for `CVE-2024-5321`.  
 `1.28.2` and higher
 `1.29.2` and higher
 `1.30.1` and higher
All versions for Kubernetes 1.31 and higher

## Using an Amazon EKS optimized Windows AMI as a base
<a name="custom-windows-ami-as-base"></a>

This option is the recommended way to build your custom Windows AMIs. The Amazon EKS optimized Windows AMIs we provide are more frequently updated than the Amazon-managed build component.

1. Start a new Image Builder recipe.

   1. Open the EC2 Image Builder console at https://console.aws.amazon.com/imagebuilder.

   1. In the left navigation pane, choose **Image recipes**.

   1. Choose **Create image recipe**.

1. In the **Recipe details** section, enter a **Name** and **Version**.

1. Specify the ID of the Amazon EKS optimized Windows AMI in the **Base image** section.

   1. Choose **Enter custom AMI ID**.

   1. Retrieve the AMI ID for the Windows OS version that you require. For more information, see [Retrieve recommended Microsoft Windows AMI IDs](retrieve-windows-ami-id.md).

   1. Enter the custom **AMI ID**. If the AMI ID isn’t found, make sure that the AWS Region for the AMI ID matches the AWS Region shown in the upper right of your console.

1. (Optional) To get the latest security updates, add the `update-windows` component in the **Build components -** section.

   1. From the dropdown list to the right of the **Find components by name** search box, choose **Amazon-managed**.

   1. In the **Find components by name** search box, enter `update-windows`.

   1. Select the check box of the ** `update-windows` ** search result. This component includes the latest Windows patches for the operating system.

1. Complete the remaining image recipe inputs with your required configurations. For more information, see [Create a new image recipe version (console)](https://docs.aws.amazon.com/imagebuilder/latest/userguide/create-image-recipes.html#create-image-recipe-version-console) in the Image Builder User Guide.

1. Choose **Create recipe**.

1. Use the new image recipe in a new or existing image pipeline. Once your image pipeline runs successfully, your custom AMI will be listed as an output image and is ready for use. For more information, see [Create an image pipeline using the EC2 Image Builder console wizard](https://docs.aws.amazon.com/imagebuilder/latest/userguide/start-build-image-pipeline.html).

## Using the Amazon-managed build component
<a name="custom-windows-ami-build-component"></a>

When using an Amazon EKS optimized Windows AMI as a base isn’t viable, you can use the Amazon-managed build component instead. This option may lag behind the most recent supported Kubernetes versions.

1. Start a new Image Builder recipe.

   1. Open the EC2 Image Builder console at https://console.aws.amazon.com/imagebuilder.

   1. In the left navigation pane, choose **Image recipes**.

   1. Choose **Create image recipe**.

1. In the **Recipe details** section, enter a **Name** and **Version**.

1. Determine which option you will be using to create your custom AMI in the **Base image** section:
   +  **Select managed images** – Choose **Windows** for your **Image Operating System (OS)**. Then choose one of the following options for **Image origin**.
     +  **Quick start (Amazon-managed)** – In the **Image name** dropdown, choose an Amazon EKS supported Windows Server version. For more information, see [Create nodes with optimized Windows AMIs](eks-optimized-windows-ami.md).
     +  **Images owned by me** – For **Image name**, choose the ARN of your own image with your own license. The image that you provide can’t already have Amazon EKS components installed.
   +  **Enter custom AMI ID** – For AMI ID, enter the ID for your AMI with your own license. The image that you provide can’t already have Amazon EKS components installed.

1. In the **Build components - Windows** section, do the following:

   1. From the dropdown list to the right of the **Find components by name** search box, choose **Amazon-managed**.

   1. In the **Find components by name** search box, enter `eks`.

   1. Select the check box of the ** `eks-optimized-ami-windows` ** search result, even though the result returned may not be the version that you want.

   1. In the **Find components by name** search box, enter `update-windows` .

   1. Select the check box of the **update-windows** search result. This component includes the latest Windows patches for the operating system.

1. In the **Selected components** section, do the following:

   1. Choose **Versioning options** for ** `eks-optimized-ami-windows` **.

   1. Choose **Specify component version**.

   1. In the **Component Version** field, enter *version.x*, replacing *version* with a supported Kubernetes version. Entering an *x* for part of the version number indicates to use the latest component version that also aligns with the part of the version you explicitly define. Pay attention to the console output as it will advise you on whether your desired version is available as a managed component. Keep in mind that the most recent Kubernetes versions may not be available for the build component. For more information about available versions, see [Retrieving information about `eks-optimized-ami-windows` component versions](#custom-windows-ami-component-versions).

1. Complete the remaining image recipe inputs with your required configurations. For more information, see [Create a new image recipe version (console)](https://docs.aws.amazon.com/imagebuilder/latest/userguide/create-image-recipes.html#create-image-recipe-version-console) in the Image Builder User Guide.

1. Choose **Create recipe**.

1. Use the new image recipe in a new or existing image pipeline. Once your image pipeline runs successfully, your custom AMI will be listed as an output image and is ready for use. For more information, see [Create an image pipeline using the EC2 Image Builder console wizard](https://docs.aws.amazon.com/imagebuilder/latest/userguide/start-build-image-pipeline.html).

## Retrieving information about `eks-optimized-ami-windows` component versions
<a name="custom-windows-ami-component-versions"></a>

You can retrieve specific information regarding what is installed with each component. For example, you can verify what `kubelet` version is installed. The components go through functional testing on the Amazon EKS supported Windows operating systems versions. For more information, see [Release calendar](eks-optimized-windows-ami.md#windows-ami-release-calendar). Any other Windows OS versions that aren’t listed as supported or have reached end of support might not be compatible with the component.

1. Open the EC2 Image Builder console at https://console.aws.amazon.com/imagebuilder.

1. In the left navigation pane, choose **Components**.

1. From the dropdown list to the right of the **Find components by name** search box, change **Owned by me** to **Quick start (Amazon-managed)**.

1. In the **Find components by name** box, enter `eks`.

1. (Optional) If you are using a recent version, sort the **Version** column in descending order by choosing it twice.

1. Choose the ** `eks-optimized-ami-windows` ** link with a desired version.

The **Description** in the resulting page shows the specific information.

# Detect node health issues and enable automatic node repair
<a name="node-health"></a>

Node health refers to the operational status and capability of a Kubernetes node to effectively run workloads. A healthy node maintains expected network connectivity, has sufficient compute and storage resources, and can successfully run workloads without disruption.

To help with maintaining healthy nodes in EKS clusters, EKS offers the *node monitoring agent* and *automatic node repair*. These features are automatically enabled with EKS Auto Mode compute. You can also use automatic node repair with EKS managed node groups and Karpenter, and can use the EKS node monitoring agent with any EKS compute types except for AWS Fargate. The EKS node monitoring agent and automatic node repair are most effective when used together, but they can also be used individually in EKS clusters.

**Important**  
The *node monitoring agent* and *node auto repair* are only available on Linux. These features aren’t available on Windows.

## Node monitoring agent
<a name="node-monitoring-agent"></a>

The EKS node monitoring agent reads node logs to detect health issues. It parses logs to detect failures and surfaces status information about the health status of the nodes. For each category of issues detected, the agent applies a dedicated `NodeCondition` to the worker nodes. For detailed information on the node health issues detected by the EKS node monitoring agent, see [Detect node health issues with the EKS node monitoring agent](node-health-nma.md).

EKS Auto Mode compute includes the node monitoring agent. For other EKS compute types, you can add the node monitoring agent as an EKS add-on or you can manage it with Kubernetes tooling such as Helm. For more information, see [Configure the node monitoring agent](node-health-nma.md#node-monitoring-agent-configure).

With the EKS node monitoring agent, the following categories of node health issues are surfaced as node conditions. Note, `Ready`, `DiskPressure`, and `MemoryPressure` are standard Kubernetes node conditions that are surfaced even without the EKS node monitoring agent.


| Node Condition | Description | 
| --- | --- | 
|  AcceleratedHardwareReady  |  AcceleratedHardwareReady indicates whether accelerated hardware (GPU, Neuron) on the node is functioning correctly.  | 
|  ContainerRuntimeReady  |  ContainerRuntimeReady indicates whether the container runtime (containerd, etc.) is functioning correctly and able to run containers.  | 
|  DiskPressure  |  DiskPressure is a standard Kubernetes condition indicating the node is experiencing disk pressure (low disk space or high I/O).  | 
|  KernelReady  |  KernelReady indicates whether the kernel is functioning correctly without critical errors, panics, or resource exhaustion.  | 
|  MemoryPressure  |  MemoryPressure is a standard Kubernetes condition indicating the node is experiencing memory pressure (low available memory).  | 
|  NetworkingReady  |  NetworkingReady indicates whether the node’s networking stack is functioning correctly (interfaces, routing, connectivity).  | 
|  StorageReady  |  StorageReady indicates whether the node’s storage subsystem is functioning correctly (disks, filesystems, I/O).  | 
|  Ready  |  Ready is the standard Kubernetes condition indicating the node is healthy and ready to accept pods.  | 

## Automatic node repair
<a name="node-auto-repair"></a>

EKS automatic node repair continuously monitors node health, reacts to detected problems, and replaces or reboots nodes when possible. This improves cluster reliability with minimal manual intervention and helps reduce application downtime.

By itself, EKS automatic node repair reacts to the `Ready` conditions of the kubelet, any manually deleted node objects, and EKS managed node group instances that fail to join the cluster. When EKS automatic node repair is enabled with the node monitoring agent installed, EKS automatic node repair reacts to additional node conditions: `AcceleratedHardwareReady`, `ContainerRuntimeReady`, `KernelReady`, `NetworkingReady`, and `StorageReady`.

EKS automatic node repair does not react to standard Kubernetes `DiskPressure`, `MemoryPressure`, or `PIDPressure` node conditions. These conditions often indicate issues with application behavior, workload configuration, or resource limits rather than node-level failures, making it difficult to determine an appropriate default repair action. In these scenarios, workloads are subject to the Kubernetes [node pressure eviction behavior](https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction).

For more information on EKS automatic node repair, see [Automatically repair nodes in EKS clusters](node-repair.md).

**Topics**

# Detect node health issues with the EKS node monitoring agent
<a name="node-health-nma"></a>

This topic details the node health issues detected by the EKS node monitoring agent, how those issues are surfaced as node conditions or events, and how to configure the node monitoring agent.

The EKS node monitoring agent can be used with or without EKS automatic node repair. For more information on EKS automatic node repair, see [Automatically repair nodes in EKS clusters](node-repair.md).

The source code for the EKS node monitoring agent is published on GitHub in the [aws/eks-node-monitoring-agent](https://github.com/aws/eks-node-monitoring-agent) repository.

## Node health issues
<a name="node-health-issues"></a>

The following tables describe node health issues that can be detected by the node monitoring agent. There are two types of issues:
+ Condition – A terminal issue that warrants a remediation action like an instance replacement or reboot. When auto repair is enabled, Amazon EKS will do a repair action, either as a node replacement or reboot. For more information, see [Node conditions](learn-status-conditions.md#status-node-conditions).
+ Event – A temporary issue or sub-optimal node configuration. No auto repair action will take place. For more information, see [Node events](learn-status-conditions.md#status-node-events).

## AcceleratedHardware node health issues
<a name="node-health-AcceleratedHardware"></a>

The monitoring condition is `AcceleratedHardwareReady` for issues in the following table that have a severity of “Condition”. The events and conditions in the table below are for NVIDIA and Neuron related node health issues.


| Name | Severity | Description | Repair Action | 
| --- | --- | --- | --- | 
|  DCGMDiagnosticFailure  |  Condition  |  A test case from the DCGM active diagnostics test suite failed.  |  None  | 
|  DCGMError  |  Condition  |  Connection to the DCGM host process was lost or could not be established.  |  None  | 
|  DCGMFieldError[Code]  |  Event  |  DCGM detected GPU degradation through a field identifier.  |  None  | 
|  DCGMHealthCode[Code]  |  Event  |  A DCGM health check failed in a non-fatal manner.  |  None  | 
|  DCGMHealthCode[Code]  |  Condition  |  A DCGM health check failed in a fatal manner.  |  None  | 
|  NeuronDMAError  |  Condition  |  A DMA engine encountered an unrecoverable error.  |  Replace  | 
|  NeuronHBMUncorrectableError  |  Condition  |  An HBM encountered an uncorrectable error and produced incorrect results.  |  Replace  | 
|  NeuronNCUncorrectableError  |  Condition  |  A Neuron Core uncorrectable memory error was detected.  |  Replace  | 
|  NeuronSRAMUncorrectableError  |  Condition  |  An on-chip SRAM encountered a parity error and produced incorrect results.  |  Replace  | 
|  NvidiaDeviceCountMismatch  |  Event  |  The number of GPUs visible through NVML is inconsistent with the NVIDIA device count on the filesystem.  |  None  | 
|  NvidiaDoubleBitError  |  Condition  |  A double bit error was produced by the GPU driver.  |  Replace  | 
|  NvidiaNCCLError  |  Event  |  A segfault occurred in the NVIDIA Collective Communications library (`libnccl`).  |  None  | 
|  NvidiaNVLinkError  |  Condition  |  NVLink errors were reported by the GPU driver.  |  Replace  | 
|  NvidiaPCIeError  |  Event  |  PCIe replays were triggered to recover from transmission errors.  |  None  | 
|  NvidiaPageRetirement  |  Event  |  The GPU driver has marked a memory page for retirement. This may occur if there is a single double bit error or two single bit errors are encountered at the same address.  |  None  | 
|  NvidiaPowerError  |  Event  |  Power utilization of GPUs breached the allowed thresholds.  |  None  | 
|  NvidiaThermalError  |  Event  |  Thermal status of GPUs breached the allowed thresholds.  |  None  | 
|  NvidiaXID[Code]Error  |  Condition  |  A critical GPU error occurred.  |  Replace or Reboot  | 
|  NvidiaXID[Code]Warning  |  Event  |  A non-critical GPU error occurred.  |  None  | 

## NVIDIA XID error codes
<a name="nvidia-xid-codes"></a>

The node monitoring agent detects NVIDIA XID errors from GPU kernel logs. XID errors fall into two categories:
+  **Well-known XID codes** – Critical errors that set a node condition (`AcceleratedHardwareReady=False`) and trigger auto repair when enabled. The reason code format is `NvidiaXID[Code]Error`. The well-known XID codes that the EKS node monitoring agent detects may not represent the full list of NVIDIA XID codes that require repair actions.
+  **Unknown XID codes** – Logged as Kubernetes events only. These don’t trigger auto repair. The reason code format is `NvidiaXID[Code]Warning`. To investigate unknown XID errors, review your kernel logs with `dmesg | grep -i nvrm`.

For more information on XID errors, see [Xid Errors](https://docs.nvidia.com/deploy/xid-errors/index.html#topic_5_1) in the *NVIDIA GPU Deployment and Management Documentation*. For more information on the individual XID messages, see [Understanding Xid Messages](https://docs.nvidia.com/deploy/gpu-debug-guidelines/index.html#understanding-xid-messages) in the *NVIDIA GPU Deployment and Management Documentation*.

The following table lists the well-known XID codes, their meanings, and the default node repair action if enabled.


| XID Code | Description | Repair Action | 
| --- | --- | --- | 
|  13  |  Graphics Engine Exception – A GPU graphics engine error occurred, typically caused by software issues or driver bugs.  |  Reboot  | 
|  31  |  GPU memory page fault – An application attempted to access GPU memory that is not mapped or accessible.  |  Reboot  | 
|  48  |  Double Bit ECC Error – An uncorrectable double-bit error occurred in GPU memory, indicating potential hardware degradation.  |  Reboot  | 
|  63  |  GPU memory remapping event – The GPU driver remapped a portion of GPU memory due to detected errors. This is often recoverable.  |  Reboot  | 
|  64  |  GPU memory remapping failure – The GPU was unable to remap defective memory, indicating hardware issues.  |  Reboot  | 
|  74  |  NVLink Error – An error occurred on the high-speed NVLink interconnect between GPUs.  |  Replace  | 
|  79  |  GPU has fallen off the bus – The GPU is no longer accessible via PCIe, typically indicating a hardware failure or power issue.  |  Replace  | 
|  94  |  Contained memory error – A memory error occurred but was contained and did not affect other applications.  |  Reboot  | 
|  95  |  Uncontained memory error – A memory error occurred that may have affected other applications or system memory.  |  Reboot  | 
|  119  |  GSP RPC Timeout – Communication with the GPU System Processor timed out, possibly due to firmware issues.  |  Replace  | 
|  120  |  GSP Error – An error occurred in the GPU System Processor.  |  Replace  | 
|  121  |  C2C Error – An error occurred on the chip-to-chip interconnect (used in multi-die GPUs).  |  Replace  | 
|  140  |  ECC Unrecovered Error – An ECC error escaped containment and may have corrupted data.  |  Replace  | 

To view the current node conditions related to GPU health, run the following command.

```
kubectl get nodes -o custom-columns='NAME:.metadata.name,ACCELERATOR_READY:.status.conditions[?(@.type=="AcceleratedHardwareReady")].status,REASON:.status.conditions[?(@.type=="AcceleratedHardwareReady")].reason'
```

To view XID-related events on your cluster, run one of the following commands.

```
kubectl get events | grep -i "NvidiaXID"
```

## ContainerRuntime node health issues
<a name="node-health-ContainerRuntime"></a>

The monitoring condition is `ContainerRuntimeReady` for issues in the following table that have a severity of “Condition”.


| Name | Severity | Description | Repair Action | 
| --- | --- | --- | --- | 
|  ContainerRuntimeFailed  |  Event  |  The container runtime has failed to create a container, likely related to any reported issues if occurring repeatedly.  |  None  | 
|  DeprecatedContainerdConfiguration  |  Event  |  A container image using deprecated image manifest version 2, schema 1 was recently pulled onto the node through `containerd`.  |  None  | 
|  KubeletFailed  |  Event  |  The kubelet entered a failed state.  |  None  | 
|  LivenessProbeFailures  |  Event  |  A liveness probe failure was detected, potentially indicating application code issues or insufficient timeout values if occurring repeatedly.  |  None  | 
|  PodStuckTerminating  |  Condition  |  A Pod is or was stuck terminating for an excessive amount of time, which can be caused by CRI errors preventing pod state progression.  |  Replace  | 
|  ReadinessProbeFailures  |  Event  |  A readiness probe failure was detected, potentially indicating application code issues or insufficient timeout values if occurring repeatedly.  |  None  | 
|  [Name]RepeatedRestart  |  Event  |  A systemd unit is restarting frequently.  |  None  | 
|  ServiceFailedToStart  |  Event  |  A systemd unit failed to start.  |  None  | 

## Kernel node health issues
<a name="node-health-Kernel"></a>

The monitoring condition is `KernelReady` for issues in the following table that have a severity of “Condition”.


| Name | Severity | Description | Repair Action | 
| --- | --- | --- | --- | 
|  AppBlocked  |  Event  |  The task has been blocked for a long period of time from scheduling, usually caused by being blocked on input or output.  |  None  | 
|  AppCrash  |  Event  |  An application on the node has crashed.  |  None  | 
|  ApproachingKernelPidMax  |  Event  |  The number of processes is approaching the maximum number of PIDs that are available per the current `kernel.pid_max` setting, after which no more processes can be launched.  |  None  | 
|  ApproachingMaxOpenFiles  |  Event  |  The number of open files is approaching the maximum number of possible open files given the current kernel settings, after which opening new files will fail.  |  None  | 
|  ConntrackExceededKernel  |  Event  |  Connection tracking exceeded the maximum for the kernel and new connections could not be established, which can result in packet loss.  |  None  | 
|  ExcessiveZombieProcesses  |  Event  |  Processes which can’t be fully reclaimed are accumulating in large numbers, which indicates application issues and may lead to reaching system process limits.  |  None  | 
|  ForkFailedOutOfPIDs  |  Condition  |  A fork or exec call has failed due to the system being out of process IDs or memory, which may be caused by zombie processes or physical memory exhaustion.  |  Replace  | 
|  KernelBug  |  Event  |  A kernel bug was detected and reported by the Linux kernel itself, though this may sometimes be caused by nodes with high CPU or memory usage leading to delayed event processing.  |  None  | 
|  LargeEnvironment  |  Event  |  The number of environment variables for this process is larger than expected, potentially caused by many services with `enableServiceLinks` set to true, which may cause performance issues.  |  None  | 
|  RapidCron  |  Event  |  A cron job is running faster than every five minutes on this node, which may impact performance if the job consumes significant resources.  |  None  | 
|  SoftLockup  |  Event  |  The CPU stalled for a given amount of time.  |  None  | 

## Networking node health issues
<a name="node-health-Networking"></a>

The monitoring condition is `NetworkingReady` for issues in the following table that have a severity of “Condition”.


| Name | Severity | Description | Repair Action | 
| --- | --- | --- | --- | 
|  BandwidthInExceeded  |  Event  |  Packets have been queued or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance.  |  None  | 
|  BandwidthOutExceeded  |  Event  |  Packets have been queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance.  |  None  | 
|  ConntrackExceeded  |  Event  |  Connection tracking exceeded the maximum for the instance and new connections could not be established, which can result in packet loss.  |  None  | 
|  EFAErrorMetric  |  Event  |  EFA driver metrics shows there is an interface with performance degredation.  |  None  | 
|  IPAMDInconsistentState  |  Event  |  The state of the IPAMD checkpoint on disk does not reflect the IPs in the container runtime.  |  None  | 
|  IPAMDNoIPs  |  Event  |  IPAMD is out of IP addresses.  |  None  | 
|  IPAMDNotReady  |  Condition  |  IPAMD fails to connect to the API server.  |  Replace  | 
|  IPAMDNotRunning  |  Condition  |  The Amazon VPC CNI process was not found to be running.  |  Replace  | 
|  IPAMDRepeatedlyRestart  |  Event  |  Multiple restarts in the IPAMD service have occurred.  |  None  | 
|  InterfaceNotRunning  |  Condition  |  This interface appears to not be running or there are network issues.  |  Replace  | 
|  InterfaceNotUp  |  Condition  |  This interface appears to not be up or there are network issues.  |  Replace  | 
|  KubeProxyNotReady  |  Event  |  Kube-proxy failed to watch or list resources.  |  None  | 
|  LinkLocalExceeded  |  Event  |  Packets were dropped because the PPS of traffic to local proxy services exceeded the network interface maximum.  |  None  | 
|  MACAddressPolicyMisconfigured  |  Event  |  The systemd-networkd link configuration has the incorrect `MACAddressPolicy` value.  |  None  | 
|  MissingDefaultRoutes  |  Event  |  There are missing default route rules.  |  None  | 
|  MissingIPRoutes  |  Event  |  There are missing routes for Pod IPs.  |  None  | 
|  MissingIPRules  |  Event  |  There are missing rules for Pod IPs.  |  None  | 
|  MissingLoopbackInterface  |  Condition  |  The loopback interface is missing from this instance, causing failure of services depending on local connectivity.  |  Replace  | 
|  NetworkSysctl  |  Event  |  This node’s network `sysctl` settings are potentially incorrect.  |  None  | 
|  PPSExceeded  |  Event  |  Packets have been queued or dropped because the bidirectional PPS exceeded the maximum for the instance.  |  None  | 
|  PortConflict  |  Event  |  If a Pod uses hostPort, it can write `iptables` rules that override the host’s already bound ports, potentially preventing API server access to `kubelet`.  |  None  | 
|  UnexpectedRejectRule  |  Event  |  An unexpected `REJECT` or `DROP` rule was found in the `iptables`, potentially blocking expected traffic.  |  None  | 

## Storage node health issues
<a name="node-health-Storage"></a>

The monitoring condition is `StorageReady` for issues in the following table that have a severity of “Condition”.


| Name | Severity | Description | Repair Action | 
| --- | --- | --- | --- | 
|  EBSInstanceIOPSExceeded  |  Event  |  Maximum IOPS for the instance was exceeded.  |  None  | 
|  EBSInstanceThroughputExceeded  |  Event  |  Maximum Throughput for the instance was exceeded.  |  None  | 
|  EBSVolumeIOPSExceeded  |  Event  |  Maximum IOPS to a particular EBS Volume was exceeded.  |  None  | 
|  EBSVolumeThroughputExceeded  |  Event  |  Maximum Throughput to a particular Amazon EBS volume was exceeded.  |  None  | 
|  EtcHostsMountFailed  |  Event  |  Mounting of the kubelet generated `/etc/hosts` failed due to userdata remounting `/var/lib/kubelet/pods` during `kubelet-container` operation.  |  None  | 
|  IODelays  |  Event  |  Input or output delay detected in a process, potentially indicating insufficient input-output provisioning if excessive.  |  None  | 
|  KubeletDiskUsageSlow  |  Event  |  The `kubelet` is reporting slow disk usage while trying to access the filesystem. This potentially indicates insufficient disk input-output or filesystem issues.  |  None  | 
|  XFSSmallAverageClusterSize  |  Event  |  The XFS Average Cluster size is small, indicating excessive free space fragmentation. This can prevent file creation despite available inodes or free space.  |  None  | 

## Configure the node monitoring agent
<a name="node-monitoring-agent-configure"></a>

The EKS node monitoring agent is deployed as a DaemonSet. When you deploy it as an EKS add-on, you can customize the installation with following configuration values. For default configurations, reference the EKS node monitoring agent [Helm chart](https://github.com/aws/eks-node-monitoring-agent/blob/main/charts/eks-node-monitoring-agent/values.yaml).


| Configuration Option | Description | 
| --- | --- | 
|   `monitoringAgent.resources.requests.cpu`   |  CPU resource request for the monitoring agent.  | 
|   `monitoringAgent.resources.requests.memory`   |  Memory resource request for the monitoring agent.  | 
|   `monitoringAgent.resources.limits.cpu`   |  CPU resource limit for the monitoring agent.  | 
|   `monitoringAgent.resources.limits.memory`   |  Memory resource limit for the monitoring agent.  | 
|   `monitoringAgent.tolerations`   |  Tolerations for scheduling the monitoring agent on tainted nodes.  | 
|   `monitoringAgent.additionalArgs`   |  Additional command-line arguments to pass to the monitoring agent.  | 

**Note**  
You can configure `hostname-override` and `verbosity` as `monitoringAgent.additionalArgs` with EKS add-ons or Helm installation. You currently cannot customize the node monitoring agent’s `probe-address` (`8002`) or `metrics-address` (`8003`) via additional args with EKS add-ons or Helm installation.

The node monitoring agent includes a NVIDIA DCGM (Data Center GPU Manager) server component (`nv-hostengine`) for monitoring NVIDIA GPUs. This component runs only on nodes that are NVIDIA GPU instance types as shown by the `nodeAffinity` in the agent’s [Helm chart](https://github.com/aws/eks-node-monitoring-agent/blob/main/charts/eks-node-monitoring-agent/values.yaml). You cannot use an existing NVIDIA DCGM installation with the EKS node monitoring agent, please provide feedback on the EKS roadmap [GitHub issue \$12763](https://github.com/aws/containers-roadmap/issues/2763) if you require this functionality.

When you deploy the EKS node monitoring agent as an EKS add-on, you can customize the NVIDIA DCGM installation with following configuration values.


| Configuration Option | Description | 
| --- | --- | 
|   `dcgmAgent.resources.requests.cpu`   |  CPU resource request for the DCGM agent.  | 
|   `dcgmAgent.resources.requests.memory`   |  Memory resource request for the DCGM agent.  | 
|   `dcgmAgent.resources.limits.cpu`   |  CPU resource limit for the DCGM agent.  | 
|   `dcgmAgent.resources.limits.memory`   |  Memory resource limit for the DCGM agent.  | 
|   `dcgmAgent.tolerations`   |  Tolerations for scheduling the DCGM agent on tainted nodes.  | 

You can use the following AWS CLI commands to get useful information about the versions and schema for the EKS node monitoring agent EKS add-on.

Get the latest agent add-on version for your Kubernetes version. Replace `1.35` with your Kubernetes version.

```
aws eks describe-addon-versions \
  --addon-name eks-node-monitoring-agent \
  --kubernetes-version 1.35 \
  --query='addons[].addonVersions[].addonVersion'
```

Get the agent add-on schema supported in EKS add-ons. Replace `v1.5.1-eksbuild.1` with your agent version.

```
aws eks describe-addon-configuration \
  --addon-name eks-node-monitoring-agent \
  --addon-version v1.5.1-eksbuild.1
```

# Automatically repair nodes in EKS clusters
<a name="node-repair"></a>

This topic details the EKS automatic node repair behavior and how to configure it to meet your requirements. EKS automatic node repair is enabled by default in EKS Auto Mode, and can be used with EKS managed node groups and Karpenter.

The default EKS automatic node repair actions are summarized in the table below and they apply to the behavior for EKS Auto Mode, EKS managed node groups, and Karpenter. When using EKS Auto Mode or Karpenter all `AcceleratedHardwareReady` repair actions are `Replace`, and only EKS managed node groups support `Reboot` as a repair action.

For a detailed list of node health issues detected by the EKS node monitoring agent and their corresponding node repair actions, see [Detect node health issues with the EKS node monitoring agent](node-health-nma.md).


| Node Condition | Description | Repair after | Repair action(s) | 
| --- | --- | --- | --- | 
|  AcceleratedHardwareReady  |  AcceleratedHardwareReady indicates whether accelerated hardware (GPU, Neuron) on the node is functioning correctly.  |  10m  |  Replace or Reboot  | 
|  ContainerRuntimeReady  |  ContainerRuntimeReady indicates whether the container runtime (containerd, etc.) is functioning correctly and able to run containers.  |  30m  |  Replace  | 
|  DiskPressure  |  DiskPressure is a standard Kubernetes condition indicating the node is experiencing disk pressure (low disk space or high I/O).  |  N/A  |  None  | 
|  KernelReady  |  KernelReady indicates whether the kernel is functioning correctly without critical errors, panics, or resource exhaustion.  |  30m  |  Replace  | 
|  MemoryPressure  |  MemoryPressure is a standard Kubernetes condition indicating the node is experiencing memory pressure (low available memory).  |  N/A  |  None  | 
|  NetworkingReady  |  NetworkingReady indicates whether the node’s networking stack is functioning correctly (interfaces, routing, connectivity).  |  30m  |  Replace  | 
|  StorageReady  |  StorageReady indicates whether the node’s storage subsystem is functioning correctly (disks, filesystems, I/O).  |  30m  |  Replace  | 
|  Ready  |  Ready is the standard Kubernetes condition indicating the node is healthy and ready to accept pods.  |  30m  |  Replace  | 

EKS automatic node repair actions are disabled in the following scenarios by default. In-progress node repair actions continue in each scenario. See [Configure automatic node repair](#configure-node-repair) for how to override these default settings.

 **EKS managed node groups** 
+ The node group has more than five nodes and more than 20% of the nodes in the node group are unhealthy.
+ A zonal shift for your cluster triggers through the Application Recovery Controller (ARC).

 **EKS Auto Mode and Karpenter** 
+ More than 20% of the nodes in the NodePool are unhealthy.
+ For standalone NodeClaims, 20% of nodes in the cluster are unhealthy.

## Configure automatic node repair
<a name="configure-node-repair"></a>

Automatic node repair cannot be configured when using EKS Auto Mode and it is always enabled with the same default settings as Karpenter.

### Karpenter
<a name="configure-node-repair-karpenter"></a>

To use automatic node repair with Karpenter, enable the feature gate `NodeRepair=true`. You can enable the feature gates through the `--feature-gates` CLI option or the `FEATURE_GATES` environment variable in the Karpenter deployment. For more information, see the [Karpenter documentation](https://karpenter.sh/docs/concepts/disruption/#node-auto-repair).

### Managed node groups
<a name="configure-node-repair-mng"></a>

You can enable automatic node repair when creating new EKS managed node groups or by updating existing EKS managed node groups.
+  **Amazon EKS console** – Select the **Enable node auto repair** checkbox for the managed node group. For more information, see [Create a managed node group for your cluster](create-managed-node-group.md).
+  ** AWS CLI** – Add `--node-repair-config enabled=true` to the [https://docs.aws.amazon.com/cli/latest/reference/eks/create-nodegroup.html](https://docs.aws.amazon.com/cli/latest/reference/eks/create-nodegroup.html) or [https://docs.aws.amazon.com/cli/latest/reference/eks/update-nodegroup-config.html](https://docs.aws.amazon.com/cli/latest/reference/eks/update-nodegroup-config.html) command.
+  **eksctl** – Configure `managedNodeGroups.nodeRepairConfig.enabled: true`, see the example in the [eksctl GitHub](https://github.com/eksctl-io/eksctl/blob/main/examples/44-node-repair.yaml).

When using EKS managed node groups, you can control node auto repair behavior with the following settings.

To control when node auto repair stops taking action, set a threshold based on the number of unhealthy nodes in the node group. Set either the absolute count or percentage, but not both.


| Setting | Description | 
| --- | --- | 
|   `maxUnhealthyNodeThresholdCount`   |  The absolute number of unhealthy nodes above which node auto repair stops. Use this to limit the scope of repairs.  | 
|   `maxUnhealthyNodeThresholdPercentage`   |  The percentage of unhealthy nodes above which node auto repair stops (0-100).  | 

To control how many nodes repair at the same time, you can configure repair parallelism. As with the unhealthy node threshold, set either the absolute count or percentage, but not both.


| Setting | Description | 
| --- | --- | 
|   `maxParallelNodesRepairedCount`   |  The maximum number of nodes to repair concurrently.  | 
|   `maxParallelNodesRepairedPercentage`   |  The maximum percentage of unhealthy nodes to repair concurrently (0-100).  | 

With `nodeRepairConfigOverrides`, you can customize repair behavior for specific conditions. Use this when you need different repair actions or wait times for different issue types.

Each override requires all of the following fields:


| Field | Description | 
| --- | --- | 
|   `nodeMonitoringCondition`   |  The node condition type reported by the node monitoring agent. For example: `AcceleratedHardwareReady`, `NetworkingReady`, `StorageReady`, `KernelReady`.  | 
|   `nodeUnhealthyReason`   |  The specific reason code for the unhealthy condition. For example: `NvidiaXID31Error`, `IPAMDNotRunning`.  | 
|   `minRepairWaitTimeMins`   |  The minimum time in minutes that the condition must persist before the node becomes eligible for repair. Use this to avoid repairing nodes for temporary issues.  | 
|   `repairAction`   |  The action to take when conditions are met. Valid values: `Replace` (terminate and replace the node), `Reboot` (reboot the node), or `NoAction` (no repair actions).  | 

The following AWS CLI example creates a node group with custom repair settings.

```
aws eks create-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --node-role arn:aws:iam::111122223333:role/NodeRole \
  --subnets subnet-0123456789abcdef0 \
  --node-repair-config '{
    "enabled": true,
    "maxUnhealthyNodeThresholdPercentage": 10,
    "maxParallelNodesRepairedCount": 3,
    "nodeRepairConfigOverrides": [
      {
        "nodeMonitoringCondition": "AcceleratedHardwareReady",
        "nodeUnhealthyReason": "NvidiaXID64Error",
        "minRepairWaitTimeMins": 5,
        "repairAction": "Replace"
      },
      {
        "nodeMonitoringCondition": "AcceleratedHardwareReady",
        "nodeUnhealthyReason": "NvidiaXID31Error",
        "minRepairWaitTimeMins": 15,
        "repairAction": "NoAction"
      }
    ]
  }'
```

This configuration does the following:
+ Enables node auto repair
+ Stops repair actions when more than 10% of nodes are unhealthy
+ Repairs up to 3 nodes at a time
+ Overrides XID 64 errors (GPU memory remapping failure) to replace the node after 5 minutes. The default is reboot after 10 minutes.
+ Overrides XID 31 errors (GPU memory page fault) to take no action. The default is reboot after 10 minutes.

# View the health status of your nodes
<a name="learn-status-conditions"></a>

This topic explains the tools and methods available for monitoring node health status in Amazon EKS clusters. The information covers node conditions, events, and detection cases that help you identify and diagnose node-level issues. Use the commands and patterns described here to inspect node health resources, interpret status conditions, and analyze node events for operational troubleshooting.

You can get some node health information with Kubernetes commands for all nodes. And if you use the node monitoring agent through Amazon EKS Auto Mode or the Amazon EKS managed add-on, you will get a wider variety of node signals to help troubleshoot. Descriptions of detected health issues by the node monitoring agent are also made available in the observability dashboard. For more information, see [Detect node health issues with the EKS node monitoring agent](node-health-nma.md).

## Node conditions
<a name="status-node-conditions"></a>

Node conditions represent terminal issues requiring remediation actions like instance replacement or reboot.

 **To get conditions for all nodes:** 

```
kubectl get nodes -o 'custom-columns=NAME:.metadata.name,CONDITIONS:.status.conditions[*].type,STATUS:.status.conditions[*].status'
```

 **To get detailed conditions for a specific node** 

```
kubectl describe node node-name
```

 **Example condition output of a healthy node:** 

```
  - lastHeartbeatTime: "2024-11-21T19:07:40Z"
    lastTransitionTime: "2024-11-08T03:57:40Z"
    message: Monitoring for the Networking system is active
    reason: NetworkingIsReady
    status: "True"
    type: NetworkingReady
```

 **Example condition of a unhealthy node with a networking problem:** 

```
  - lastHeartbeatTime: "2024-11-21T19:12:29Z"
    lastTransitionTime: "2024-11-08T17:04:17Z"
    message: IPAM-D has failed to connect to API Server which could be an issue with
      IPTable rules or any other network configuration.
    reason: IPAMDNotReady
    status: "False"
    type: NetworkingReady
```

## Node events
<a name="status-node-events"></a>

Node events indicate temporary issues or sub-optimal configurations.

 **To get all events reported by the node monitoring agent** 

When the node monitoring agent is available, you can run the following command.

```
kubectl get events --field-selector=reportingComponent=eks-node-monitoring-agent
```

Sample output:

```
LAST SEEN   TYPE      REASON       OBJECT                                              MESSAGE
4s          Warning   SoftLockup   node/ip-192-168-71-251.us-west-2.compute.internal   CPU stuck for 23s
```

 **To get events for all nodes** 

```
kubectl get events --field-selector involvedObject.kind=Node
```

 **To get events for a specific node** 

```
kubectl get events --field-selector involvedObject.kind=Node,involvedObject.name=node-name
```

 **To watch events in real-time** 

```
kubectl get events -w --field-selector involvedObject.kind=Node
```

 **Example event output:** 

```
LAST SEEN   TYPE     REASON           OBJECT         MESSAGE
2m          Warning  MemoryPressure   Node/node-1    Node experiencing memory pressure
5m          Normal   NodeReady        Node/node-1    Node became ready
```

## Common troubleshooting commands
<a name="status-node-troubleshooting"></a>

```
# Get comprehensive node status
kubectl get node node-name -o yaml

# Watch node status changes
kubectl get nodes -w

# Get node metrics
kubectl top node
```

# Retrieve node logs for a managed node using kubectl and S3
<a name="auto-get-logs"></a>

Learn how to retrieve node logs for an Amazon EKS managed node that has the node monitoring agent.

## Prerequisites
<a name="_prerequisites"></a>

Make sure you have the following:
+ An existing Amazon EKS cluster with the node monitoring agent. For more information, see [Detect node health issues and enable automatic node repair](node-health.md).
+ The `kubectl` command-line tool installed and configured to communicate with your cluster.
+ The AWS CLI installed and logged in with sufficent permissions to create S3 buckets and objects.
+ A recent version of Python 3 installed
+ The AWS SDK for Python 3, Boto 3, installed.

## Step 1: Create S3 bucket destination (optional)
<a name="_step_1_create_s3_bucket_destination_optional"></a>

If you don’t already have an S3 bucket to store the logs, create one. Use the following AWS CLI command. The bucket defaults to the `private` access control list. Replace *bucket-name* with your chosen unique bucket name.

```
aws s3api create-bucket --bucket <bucket-name>
```

## Step 2: Create pre-signed S3 URL for HTTP Put
<a name="_step_2_create_pre_signed_s3_url_for_http_put"></a>

Amazon EKS returns the node logs by doing a HTTP PUT operation to a URL you specify. In this tutorial, we will generate a pre-signed S3 HTTP PUT URL.

The logs will be returned as a gzip tarball, with the `.tar.gz` extension.

**Note**  
You must use the AWS API or a SDK to create the pre-signed S3 upload URL for EKS to upload the log file. You cannot create a pre-signed S3 upload URL using the AWS CLI.

1. Determine where in the bucket you want to store the logs. For example, you might use *2024-11-12/logs1.tar.gz* as the key.

1. Save the following Python code to the file *presign-upload.py*. Replace *<bucket-name>* and *<key>*. The key should end with `.tar.gz`.

   ```
   import boto3; print(boto3.client('s3').generate_presigned_url(
      ClientMethod='put_object',
      Params={'Bucket': '<bucket-name>', 'Key': '<key>'},
      ExpiresIn=1000
   ))
   ```

1. Run the script with

   ```
   python presign-upload.py
   ```

1. Note the URL output. Use this value in the next step as the *http-put-destination*.

For more information, see [Generate a presigned URL to upload a file](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-presigned-urls.html#generating-a-presigned-url-to-upload-a-file) in the AWS Boto3 SDK for Python Documentation.

## Step 3: Create NodeDiagnostic resource
<a name="_step_3_create_nodediagnostic_resource"></a>

Identify the name of the node you want to collect logs from.

Create a `NodeDiagnostic` manifest that uses the name of the node as the resource’s name, and providing a HTTP PUT URL destination.

```
apiVersion: eks.amazonaws.com/v1alpha1
kind: NodeDiagnostic
metadata:
    name: <node-name>
spec:
    logCapture:
        destination: http-put-destination
```

Apply the manifest to the cluster.

```
kubectl apply -f nodediagnostic.yaml
```

You can check on the Status of the collection by describing the `NodeDiagnostic` resource:
+ A status of `Success` or `SuccessWithErrors` indicates that the task completed and the logs uploaded to the provided destination (`SuccessWithErrors` indicates that some logs might be missing)
+ If the status is Failure, confirm the upload URL is well-formed and not expired.

```
kubectl describe nodediagnostics.eks.amazonaws.com/<node-name>
```

## Step 4: Download logs from S3
<a name="_step_4_download_logs_from_s3"></a>

Wait approximately one minute before attempting to download the logs. Then, use the S3 CLI to download the logs.

```
# Once NodeDiagnostic shows Success status, download the logs
aws s3 cp s3://<bucket-name>/key ./<path-to-node-logs>.tar.gz
```

## Step 5: Clean up NodeDiagnostic resource
<a name="_step_5_clean_up_nodediagnostic_resource"></a>
+  `NodeDiagnostic` resources do not get automatically deleted. You should clean these up on your own after you have obtained your log artifacts

```
# Delete the NodeDiagnostic resource
kubectl delete nodediagnostics.eks.amazonaws.com/<node-name>
```

## NodeDiagnostic `node` Destination
<a name="_nodediagnostic_node_destination"></a>

Starting with version `v1.6.1-eksbuild.1` of the Node Monitoring Agent, there is an option to set the log collection destination to `node`. Using this destination will lead to the collection and temporary persistence of logs on the node for later collection. In addition to this functionality, within the Node Monitoring Agent’s GitHub repository is a `kubectl` plugin you can install for easy interaction and log collection. For more information, see the [documentation for the `kubectl ekslogs` plugin](https://github.com/aws/eks-node-monitoring-agent/blob/main/tools/kubectl-ekslogs/README.md).

## Example Usage
<a name="_example_usage"></a>

```
# Collect NodeDiagnostic logs from a single node
kubectl ekslogs <node-name>

# Collect NodeDiagnostic logs from multiple nodes
kubectl ekslogs <node-name-1> <node-name-2> <node-name-3>

# Collect NodeDiagnostic logs from all nodes with a specific label
kubectl ekslogs -l <key>=<value>
```

# Capture network traffic on a managed node using kubectl and S3
<a name="auto-get-tcpdump"></a>

Learn how to capture network traffic on an Amazon EKS managed node that has the node monitoring agent. The agent runs tcpdump on the node, compresses capture files, and uploads them to your S3 bucket.

## Prerequisites
<a name="_prerequisites"></a>

Make sure you have the following:
+ An existing Amazon EKS Auto Mode cluster with the node monitoring agent. For more information, see [Detect node health issues and enable automatic node repair](node-health.md).
+ The `kubectl` command-line tool installed and configured to communicate with your cluster.
+ The AWS CLI installed and logged in with sufficient permissions to create S3 buckets and objects.
+ A recent version of Python 3 installed.
+ The AWS SDK for Python 3, Boto 3, installed.
+ The PyYAML library installed (`pip install pyyaml`).

## Step 1: Create S3 bucket destination (optional)
<a name="_step_1_create_s3_bucket_destination_optional"></a>

If you don’t already have an S3 bucket to store the capture files, create one. Replace *bucket-name* and *region* with your values.

```
aws s3api create-bucket --bucket <bucket-name> \
    --region <region> \
    --create-bucket-configuration LocationConstraint=<region>
```

**Note**  
The `--create-bucket-configuration` parameter is required for all regions except `us-east-1`.

## Step 2: Start packet capture
<a name="_step_2_start_packet_capture"></a>

Use the `start-capture.py` script from the [node monitoring agent repository](https://github.com/aws/eks-node-monitoring-agent) (`tools/start-capture.py`) to generate pre-signed S3 credentials, create the `NodeDiagnostic` resource, and apply it to your cluster.

1. Identify the node you want to capture traffic from.

   ```
   kubectl get nodes
   ```

1. Save the [start-capture.py](https://github.com/aws/eks-node-monitoring-agent/blob/main/tools/start-capture.py) script from the node monitoring agent repository to your local machine, then run it. Replace *<bucket-name>* and *<node-name>* with your values.

   ```
   python3 start-capture.py --bucket <bucket-name> --node <node-name>
   ```

   Common options:

   ```
   # Capture for 5 minutes on eth0 with a filter
   python3 start-capture.py --bucket <bucket-name> --node <node-name> \
       --duration 5m --interface eth0 --filter "tcp port 443"
   
   # Preview the YAML without applying
   python3 start-capture.py --bucket <bucket-name> --node <node-name> --dry-run
   ```

   The script requires Python 3 with `boto3` and `pyyaml` installed, and `kubectl` configured for your cluster.

   The script generates a `NodeDiagnostic` resource like the following. This example is provided for reference; note that the `upload` fields require pre-signed S3 POST credentials that are generated programmatically by the script.

   ```
   apiVersion: eks.amazonaws.com/v1alpha1
   kind: NodeDiagnostic
   metadata:
     name: <node-name>                    # Required: node instance ID
   spec:
     packetCapture:
       duration: "30s"                       # Required: capture duration (max 1h)
       # interface: "eth0"                   # Optional: default is primary ENI. Use "any" for all interfaces
       # filter: "tcp port 443"             # Optional: tcpdump filter expression
       # chunkSizeMB: 10                    # Optional: file rotation size in MB (1-100, default: 10)
       upload:                               # Required: pre-signed S3 POST credentials
         url: "https://<bucket>.s3.amazonaws.com/"
         fields:
           key: "captures/<node-name>/${filename}"
           # ... other pre-signed POST fields (generated by the script)
   ```

## Step 3: Monitor capture progress
<a name="_step_3_monitor_capture_progress"></a>

Check the status of the capture.

```
kubectl describe nodediagnostic <node-name>
```

The status will show:
+  `Running` while the capture is in progress.
+  `Completed` with reason `Success` when the capture finishes and all files are uploaded.
+  `Completed` with reason `Failure` if the capture encountered errors.

To see the full status including `captureID` (used for S3 path identification):

```
kubectl get nodediagnostic <node-name> -o jsonpath='{.status.captureStatuses}'
```

## Step 4: Download capture files from S3
<a name="_step_4_download_capture_files_from_s3"></a>

Once the status shows `Success`, download the capture files from S3.

```
aws s3 cp s3://<bucket-name>/captures/ ./captures/ --recursive
```

The files are gzip-compressed pcap format. Decompress and analyze with tcpdump or Wireshark:

```
gunzip captures/*.gz
tcpdump -r captures/capture.pcap0000 -n
```

## Step 5: Clean up
<a name="_step_5_clean_up"></a>

 `NodeDiagnostic` resources are not automatically deleted. Clean up after you have obtained your capture files. Deleting the resource while a capture is running will stop the capture immediately.

```
kubectl delete nodediagnostic <node-name>
```

## Configuration options and behavior
<a name="_configuration_options_and_behavior"></a>

For the full `packetCapture` spec reference, configuration options, and behavior details, see the [packet capture documentation](https://github.com/aws/eks-node-monitoring-agent/blob/main/docs/packet-capture.adoc) in the node monitoring agent repository.

# Amazon EKS Hybrid Nodes overview
<a name="hybrid-nodes-overview"></a>

With *Amazon EKS Hybrid Nodes*, you can use your on-premises and edge infrastructure as nodes in Amazon EKS clusters. AWS manages the AWS-hosted Kubernetes control plane of the Amazon EKS cluster, and you manage the hybrid nodes that run in your on-premises or edge environments. This unifies Kubernetes management across your environments and offloads Kubernetes control plane management to AWS for your on-premises and edge applications.

Amazon EKS Hybrid Nodes works with any on-premises hardware or virtual machines, bringing the efficiency, scalability, and availability of Amazon EKS to wherever your applications need to run. You can use a wide range of Amazon EKS features with Amazon EKS Hybrid Nodes including Amazon EKS add-ons, Amazon EKS Pod Identity, cluster access entries, cluster insights, and extended Kubernetes version support. Amazon EKS Hybrid Nodes natively integrates with AWS services including AWS Systems Manager, AWS IAM Roles Anywhere, Amazon Managed Service for Prometheus, and Amazon CloudWatch for centralized monitoring, logging, and identity management.

With Amazon EKS Hybrid Nodes, there are no upfront commitments or minimum fees, and you are charged per hour for the vCPU resources of your hybrid nodes when they are attached to your Amazon EKS clusters. For more pricing information, see [Amazon EKS Pricing](https://aws.amazon.com/eks/pricing/).

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/tFn9IdlddBw?rel=0/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/tFn9IdlddBw?rel=0)


## Features
<a name="hybrid-nodes-features"></a>

EKS Hybrid Nodes has the following high-level features:
+  **Managed Kubernetes control plane**: AWS manages the AWS-hosted Kubernetes control plane of the EKS cluster, and you manage the hybrid nodes that run in your on-premises or edge environments. This unifies Kubernetes management across your environments and offloads Kubernetes control plane management to AWS for your on-premises and edge applications. By moving the Kubernetes control plane to AWS, you can conserve on-premises capacity for your applications and trust that the Kubernetes control plane scales with your workloads.
+  **Consistent EKS experience**: Most EKS features are supported with EKS Hybrid Nodes for a consistent EKS experience across your on-premises and cloud environments including EKS add-ons, EKS Pod Identity, cluster access entries, cluster insights, extended Kubernetes version support, and more. See [Configure add-ons for hybrid nodes](hybrid-nodes-add-ons.md) for more information on the EKS add-ons supported with EKS Hybrid Nodes.
+  **Centralized observability and identity management**: EKS Hybrid Nodes natively integrates with AWS services including AWS Systems Manager, AWS IAM Roles Anywhere, Amazon Managed Service for Prometheus, and Amazon CloudWatch for centralized monitoring, logging, and identity management.
+  **Burst-to-cloud or add on-premises capacity**: A single EKS cluster can be used to run hybrid nodes and nodes in AWS Regions, AWS Local Zones, or AWS Outposts to burst-to-cloud or add on-premises capacity to your EKS clusters. See [Considerations for mixed mode clusters](hybrid-nodes-webhooks.md#hybrid-nodes-considerations-mixed-mode) for more information.
+  **Flexible infrastructure**: EKS Hybrid Nodes follows a *bring your own infrastructure* approach and is agnostic to the infrastructure you use for hybrid nodes. You can run hybrid nodes on physical or virtual machines, and x86 and ARM architectures, making it possible to migrate on-premises workloads running on hybrid nodes across different infrastructure types.
+  **Flexible networking**: With EKS Hybrid Nodes, communication between the EKS control plane and hybrid nodes is routed through the VPC and subnets you pass during cluster creation, which builds on the [existing mechanism](https://docs.aws.amazon.com/eks/latest/best-practices/subnets.html) in EKS for control plane to node networking. This is flexible to your preferred method of connecting your on-premises networks to a VPC in AWS. There are several [documented options](https://docs.aws.amazon.com/whitepapers/latest/aws-vpc-connectivity-options/network-to-amazon-vpc-connectivity-options.html) available including AWS Site-to-Site VPN, AWS Direct Connect, or your own VPN solution, and you can choose the method that best fits your use case.

## Limits
<a name="hybrid-node-limits"></a>
+ Up to 15 CIDRs for Remote Node Networks and 15 CIDRs for Remote Pod Networks per cluster are supported.

## Considerations
<a name="hybrid-nodes-general"></a>
+ EKS Hybrid Nodes can be used with new or existing EKS clusters.
+ EKS Hybrid Nodes is available in all AWS Regions, except the AWS GovCloud (US) Regions and the AWS China Regions.
+ EKS Hybrid Nodes must have a reliable connection between your on-premises environment and AWS. EKS Hybrid Nodes is not a fit for disconnected, disrupted, intermittent or limited (DDIL) environments. If you are running in a DDIL environment, consider [Amazon EKS Anywhere](https://aws.amazon.com/eks/eks-anywhere/). Reference the [Best Practices for EKS Hybrid Nodes](https://docs.aws.amazon.com/eks/latest/best-practices/hybrid-nodes-network-disconnections.html) for information on how hybrid nodes behave during network disconnection scenarios.
+ Running EKS Hybrid Nodes on cloud infrastructure, including AWS Regions, AWS Local Zones, AWS Outposts, or in other clouds, is not supported. You will be charged the hybrid nodes fee if you run hybrid nodes on Amazon EC2 instances.
+ Billing for hybrid nodes starts when the nodes join the EKS cluster and stops when the nodes are removed from the cluster. Be sure to remove your hybrid nodes from your EKS cluster if you are not using them.

## Additional resources
<a name="hybrid-nodes-resources"></a>
+  [https://www.eksworkshop.com/docs/networking/eks-hybrid-nodes/](https://www.eksworkshop.com/docs/networking/eks-hybrid-nodes/): Step-by-step instructions for deploying EKS Hybrid Nodes in a demo environment.
+  [https://www.youtube.com/watch?v=ZxC7SkemxvU](https://www.youtube.com/watch?v=ZxC7SkemxvU): AWS re:Invent session introducing the EKS Hybrid Nodes launch with a customer showing how they are using EKS Hybrid Nodes in their environment.
+  [https://repost.aws/articles/ARL44xuau6TG2t-JoJ3mJ5Mw/unpacking-the-cluster-networking-for-amazon-eks-hybrid-nodes](https://repost.aws/articles/ARL44xuau6TG2t-JoJ3mJ5Mw/unpacking-the-cluster-networking-for-amazon-eks-hybrid-nodes): Article explaining various methods for setting up networking for EKS Hybrid Nodes.
+  [https://aws.amazon.com/blogs/containers/run-genai-inference-across-environments-with-amazon-eks-hybrid-nodes/](https://aws.amazon.com/blogs/containers/run-genai-inference-across-environments-with-amazon-eks-hybrid-nodes/): Blog post showing how to run GenAI inference across environments with EKS Hybrid Nodes.

# Prerequisite setup for hybrid nodes
<a name="hybrid-nodes-prereqs"></a>

To use Amazon EKS Hybrid Nodes, you must have private connectivity from your on-premises environment to/from AWS, bare metal servers or virtual machines with a supported operating system, and AWS IAM Roles Anywhere or AWS Systems Manager (SSM) hybrid activations configured. You are responsible for managing these prerequisites throughout the hybrid nodes lifecycle.
+ Hybrid network connectivity from your on-premises environment to/from AWS 
+ Infrastructure in the form of physical or virtual machines
+ Operating system that is compatible with hybrid nodes
+ On-premises IAM credentials provider configured

![\[Hybrid node network connectivity.\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-prereq-diagram.png)


## Hybrid network connectivity
<a name="hybrid-nodes-prereqs-connect"></a>

The communication between the Amazon EKS control plane and hybrid nodes is routed through the VPC and subnets you pass during cluster creation, which builds on the [existing mechanism](https://aws.github.io/aws-eks-best-practices/networking/subnets/) in Amazon EKS for control plane to node networking. There are several [documented options](https://docs.aws.amazon.com/whitepapers/latest/aws-vpc-connectivity-options/network-to-amazon-vpc-connectivity-options.html) available for you to connect your on-premises environment with your VPC including AWS Site-to-Site VPN, AWS Direct Connect, or your own VPN connection. Reference the [AWS Site-to-Site VPN](https://docs.aws.amazon.com/vpn/latest/s2svpn/VPC_VPN.html) and [AWS Direct Connect](https://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html) user guides for more information on how to use those solutions for your hybrid network connection.

For an optimal experience, we recommend that you have reliable network connectivity of at least 100 Mbps and a maximum of 200ms round trip latency for the hybrid nodes connection to the AWS Region. This is general guidance that accommodates most use cases but is not a strict requirement. The bandwidth and latency requirements can vary depending on the number of hybrid nodes and your workload characteristics, such as application image size, application elasticity, monitoring and logging configurations, and application dependencies on accessing data stored in other AWS services. We recommend that you test with your own applications and environments before deploying to production to validate that your networking setup meets the requirements for your workloads.

## On-premises network configuration
<a name="hybrid-nodes-prereqs-onprem"></a>

You must enable inbound network access from the Amazon EKS control plane to your on-premises environment to allow the Amazon EKS control plane to communicate with the `kubelet` running on hybrid nodes and optionally with webhooks running on your hybrid nodes. Additionally, you must enable outbound network access for your hybrid nodes and components running on them to communicate with the Amazon EKS control plane. You can configure this communication to stay fully private to your AWS Direct Connect, AWS Site-to-Site VPN, or your own VPN connection.

The Classless Inter-Domain Routing (CIDR) ranges you use for your on-premises node and pod networks must use IPv4 RFC-1918 or CGNAT address ranges. Your on-premises router must be configured with routes to your on-premises nodes and optionally pods. See [On-premises networking configuration](hybrid-nodes-networking.md#hybrid-nodes-networking-on-prem) for more information on the on-premises network requirements, including the full list of required ports and protocols that must be enabled in your firewall and on-premises environment.

## EKS cluster configuration
<a name="hybrid-nodes-prereqs-cluster"></a>

To minimize latency, we recommend that you create your Amazon EKS cluster in the AWS Region closest to your on-premises or edge environment. You pass your on-premises node and pod CIDRs during Amazon EKS cluster creation via two API fields: `RemoteNodeNetwork` and `RemotePodNetwork`. You may need to discuss with your on-premises network team to identify your on-premises node and pod CIDRs. The node CIDR is allocated from your on-premises network and the pod CIDR is allocated from the Container Network Interface (CNI) you use if you are using an overlay network for your CNI. Cilium and Calico use overlay networks by default.

The on-premises node and pod CIDRs you configure via the `RemoteNodeNetwork` and `RemotePodNetwork` fields are used to configure the Amazon EKS control plane to route traffic through your VPC to the `kubelet` and the pods running on your hybrid nodes. Your on-premises node and pod CIDRs cannot overlap with each other, the VPC CIDR you pass during cluster creation, or the service IPv4 configuration for your Amazon EKS cluster. Also, Pod CIDRs must be unique to each EKS cluster so that your on-premises router can route traffic.

We recommend that you use either public or private endpoint access for the Amazon EKS Kubernetes API server endpoint. If you choose “Public and Private”, the Amazon EKS Kubernetes API server endpoint will always resolve to the public IPs for hybrid nodes running outside of your VPC, which can prevent your hybrid nodes from joining the cluster. When you use public endpoint access, the Kubernetes API server endpoint is resolved to public IPs and the communication from hybrid nodes to the Amazon EKS control plane will be routed over the internet. When you choose private endpoint access, the Kubernetes API server endpoint is resolved to private IPs and the communication from hybrid nodes to the Amazon EKS control plane will be routed over your private connectivity link, in most cases AWS Direct Connect or AWS Site-to-Site VPN.

## VPC configuration
<a name="hybrid-nodes-prereqs-vpc"></a>

You must configure the VPC you pass during Amazon EKS cluster creation with routes in its routing table for your on-premises node and optionally pod networks with your virtual private gateway (VGW) or transit gateway (TGW) as the target. An example is shown below. Replace `REMOTE_NODE_CIDR` and `REMOTE_POD_CIDR` with the values for your on-premises network.


| Destination | Target | Description | 
| --- | --- | --- | 
|  10.226.0.0/16  |  local  |  Traffic local to the VPC routes within the VPC  | 
|  REMOTE\$1NODE\$1CIDR  |  tgw-abcdef123456  |  On-prem node CIDR, route traffic to the TGW  | 
|  REMOTE\$1POD\$1CIDR  |  tgw-abcdef123456  |  On-prem pod CIDR, route traffic to the TGW  | 

## Security group configuration
<a name="hybrid-nodes-prereqs-sg"></a>

When you create a cluster, Amazon EKS creates a security group that’s named `eks-cluster-sg-<cluster-name>-<uniqueID>`. You cannot alter the inbound rules of this Cluster Security Group but you can restrict the outbound rules. You must add an additional security group to your cluster to enable the kubelet and optionally webhooks running on your hybrid nodes to contact the Amazon EKS control plane. The required inbound rules for this additional security group are shown below. Replace `REMOTE_NODE_CIDR` and `REMOTE_POD_CIDR` with the values for your on-premises network.


| Name | Security group rule ID | IP version | Type | Protocol | Port range | Source | 
| --- | --- | --- | --- | --- | --- | --- | 
|  On-prem node inbound  |  sgr-abcdef123456  |  IPv4  |  HTTPS  |  TCP  |  443  |  REMOTE\$1NODE\$1CIDR  | 
|  On-prem pod inbound  |  sgr-abcdef654321  |  IPv4  |  HTTPS  |  TCP  |  443  |  REMOTE\$1POD\$1CIDR  | 

## Infrastructure
<a name="hybrid-nodes-prereqs-infra"></a>

You must have bare metal servers or virtual machines available to use as hybrid nodes. Hybrid nodes are agnostic to the underlying infrastructure and support x86 and ARM architectures. Amazon EKS Hybrid Nodes follows a “bring your own infrastructure” approach, where you are responsible for provisioning and managing the bare metal servers or virtual machines that you use for hybrid nodes. While there is not a strict minimum resource requirement, we recommend that you use hosts with at least 1 vCPU and 1GiB RAM for hybrid nodes.

## Operating system
<a name="hybrid-nodes-prereqs-os"></a>

Bottlerocket, Amazon Linux 2023 (AL2023), Ubuntu, and RHEL are validated on an ongoing basis for use as the node operating system for hybrid nodes. Bottlerocket is supported by AWSin VMware vSphere environments only. AL2023 is not covered by AWS Support Plans when run outside of Amazon EC2. AL2023 can only be used in on-premises virtualized environments, see the [Amazon Linux 2023 User Guide](https://docs.aws.amazon.com/linux/al2023/ug/outside-ec2.html) for more information. AWS supports the hybrid nodes integration with Ubuntu and RHEL operating systems but does not provide support for the operating system itself.

You are responsible for operating system provisioning and management. When you are testing hybrid nodes for the first time, it is easiest to run the Amazon EKS Hybrid Nodes CLI (`nodeadm`) on an already provisioned host. For production deployments, we recommend that you include `nodeadm` in your golden operating system images with it configured to run as a systemd service to automatically join hosts to Amazon EKS clusters at host startup.

## On-premises IAM credentials provider
<a name="hybrid-nodes-prereqs-iam"></a>

Amazon EKS Hybrid Nodes use temporary IAM credentials provisioned by AWS SSM hybrid activations or AWS IAM Roles Anywhere to authenticate with the Amazon EKS cluster. You must use either AWS SSM hybrid activations or AWS IAM Roles Anywhere with the Amazon EKS Hybrid Nodes CLI (`nodeadm`). We recommend that you use AWS SSM hybrid activations if you do not have existing Public Key Infrastructure (PKI) with a Certificate Authority (CA) and certificates for your on-premises environments. If you do have existing PKI and certificates on-premises, use AWS IAM Roles Anywhere.

Similar to the [Amazon EKS node IAM role](create-node-role.md) for nodes running on Amazon EC2, you will create a Hybrid Nodes IAM Role with the required permissions to join hybrid nodes to Amazon EKS clusters. If you are using AWS IAM Roles Anywhere, configure a trust policy that allows AWS IAM Roles Anywhere to assume the Hybrid Nodes IAM Role and configure your AWS IAM Roles Anywhere profile with the Hybrid Nodes IAM Role as an assumable role. If you are using AWS SSM, configure a trust policy that allows AWS SSM to assume the Hybrid Nodes IAM Role and create the hybrid activation with the Hybrid Nodes IAM Role. See [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md) for how to create the Hybrid Nodes IAM Role with the required permissions.

# Prepare networking for hybrid nodes
<a name="hybrid-nodes-networking"></a>

This topic provides an overview of the networking setup you must have configured before creating your Amazon EKS cluster and attaching hybrid nodes. This guide assumes you have met the prerequisite requirements for hybrid network connectivity using [AWS Site-to-Site VPN](https://docs.aws.amazon.com/vpn/latest/s2svpn/SetUpVPNConnections.html), [AWS Direct Connect](https://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html), or your own VPN solution.

![\[Hybrid node network connectivity.\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-prereq-diagram.png)


## On-premises networking configuration
<a name="hybrid-nodes-networking-on-prem"></a>

### Minimum network requirements
<a name="hybrid-nodes-networking-min-reqs"></a>

For an optimal experience, we recommend that you have reliable network connectivity of at least 100 Mbps and a maximum of 200ms round trip latency for the hybrid nodes connection to the AWS Region. This is general guidance that accommodates most use cases but is not a strict requirement. The bandwidth and latency requirements can vary depending on the number of hybrid nodes and your workload characteristics, such as application image size, application elasticity, monitoring and logging configurations, and application dependencies on accessing data stored in other AWS services. We recommend that you test with your own applications and environments before deploying to production to validate that your networking setup meets the requirements for your workloads.

### On-premises node and pod CIDRs
<a name="hybrid-nodes-networking-on-prem-cidrs"></a>

Identify the node and pod CIDRs you will use for your hybrid nodes and the workloads running on them. The node CIDR is allocated from your on-premises network and the pod CIDR is allocated from your Container Network Interface (CNI) if you are using an overlay network for your CNI. You pass your on-premises node CIDRs and pod CIDRs as inputs when you create your EKS cluster with the `RemoteNodeNetwork` and `RemotePodNetwork` fields. Your on-premises node CIDRs must be routable on your on-premises network. See the following section for information on the on-premises pod CIDR routability.

The on-premises node and pod CIDR blocks must meet the following requirements:

1. Be within one of the following `IPv4` RFC-1918 ranges: `10.0.0.0/8`, `172.16.0.0/12`, or `192.168.0.0/16` , or within the CGNAT range defined by RFC 6598: `100.64.0.0/10` .

1. Not overlap with each other, the VPC CIDR for your EKS cluster, or your Kubernetes service `IPv4` CIDR.

### On-premises pod network routing
<a name="hybrid-nodes-networking-on-prem-pod-routing"></a>

When using EKS Hybrid Nodes, we generally recommend that you make your on-premises pod CIDRs routable on your on-premises network to enable full cluster communication and functionality between cloud and on-premises environments.

 **Routable pod networks** 

If you are able to make your pod network routable on your on-premises network, follow the guidance below.

1. Configure the `RemotePodNetwork` field for your EKS cluster with your on-premises pod CIDR, your VPC route tables with your on-premises pod CIDR, and your EKS cluster security group with your on-premises pod CIDR.

1. There are several techniques you can use to make your on-premises pod CIDR routable on your on-premises network including Border Gateway Protocol (BGP), static routes, or other custom routing solutions. BGP is the recommended solution as it is more scalable and easier to manage than alternative solutions that require custom or manual route configuration. AWS supports the BGP capabilities of Cilium and Calico for advertising pod CIDRs, see [Configure CNI for hybrid nodes](hybrid-nodes-cni.md) and [Routable remote Pod CIDRs](hybrid-nodes-concepts-kubernetes.md#hybrid-nodes-concepts-k8s-pod-cidrs) for more information.

1. Webhooks can run on hybrid nodes as the EKS control plane is able to communicate with the Pod IP addresses assigned to the webhooks.

1. Workloads running on cloud nodes are able to communicate directly with workloads running on hybrid nodes in the same EKS cluster.

1. Other AWS services, such as AWS Application Load Balancers and Amazon Managed Service for Prometheus, are able to communicate with workloads running on hybrid nodes to balance network traffic and scrape pod metrics.

 **Unroutable pod networks** 

If you are *not* able to make your pod networks routable on your on-premises network, follow the guidance below.

1. Webhooks cannot run on hybrid nodes because webhooks require connectivity from the EKS control plane to the Pod IP addresses assigned to the webhooks. In this case, we recommend that you run webhooks on cloud nodes in the same EKS cluster as your hybrid nodes, see [Configure webhooks for hybrid nodes](hybrid-nodes-webhooks.md) for more information.

1. Workloads running on cloud nodes are not able to communicate directly with workloads running on hybrid nodes when using the VPC CNI for cloud nodes and Cilium or Calico for hybrid nodes.

1. Use Service Traffic Distribution to keep traffic local to the zone it is originating from. For more information on Service Traffic Distribution, see [Configure Service Traffic Distribution](hybrid-nodes-webhooks.md#hybrid-nodes-mixed-service-traffic-distribution).

1. Configure your CNI to use egress masquerade or network address translation (NAT) for pod traffic as it leaves your on-premises hosts. This is enabled by default in Cilium. Calico requires `natOutgoing` to be set to `true`.

1. Other AWS services, such as AWS Application Load Balancers and Amazon Managed Service for Prometheus, are not able to communicate with workloads running on hybrid nodes.

### Access required during hybrid node installation and upgrade
<a name="hybrid-nodes-networking-access-reqs"></a>

You must have access to the following domains during the installation process where you install the hybrid nodes dependencies on your hosts. This process can be done once when you are building your operating system images or it can be done on each host at runtime. This includes initial installation and when you upgrade the Kubernetes version of your hybrid nodes.

Some packages are installed using the OS’s default package manager. For AL2023 and RHEL, the `yum` command is used to install `containerd`, `ca-certificates`, `iptables` and `amazon-ssm-agent`. For Ubuntu, `apt` is used to install `containerd`, `ca-certificates`, and `iptables`, and `snap` is used to install `amazon-ssm-agent`.


| Component | URL | Protocol | Port | 
| --- | --- | --- | --- | 
|  EKS node artifacts (S3)  |  https://hybrid-assets.eks.amazonaws.com  |  HTTPS  |  443  | 
|   [EKS service endpoints](https://docs.aws.amazon.com/general/latest/gr/eks.html)   |  https://eks.*region*.amazonaws.com  |  HTTPS  |  443  | 
|   [ECR service endpoints](https://docs.aws.amazon.com/general/latest/gr/ecr.html)   |  https://api.ecr.*region*.amazonaws.com  |  HTTPS  |  443  | 
|  EKS ECR endpoints  |  See [View Amazon container image registries for Amazon EKS add-ons](add-ons-images.md) for regional endpoints.  |  HTTPS  |  443  | 
|  SSM binary endpoint 1   |  https://amazon-ssm-*region*.s3.*region*.amazonaws.com  |  HTTPS  |  443  | 
|   [SSM service endpoint](https://docs.aws.amazon.com/general/latest/gr/ssm.html) 1   |  https://ssm.*region*.amazonaws.com  |  HTTPS  |  443  | 
|  IAM Anywhere binary endpoint 2   |  https://rolesanywhere.amazonaws.com  |  HTTPS  |  443  | 
|   [IAM Anywhere service endpoint](https://docs.aws.amazon.com/general/latest/gr/rolesanywhere.html) 2   |  https://rolesanywhere.*region*.amazonaws.com  |  HTTPS  |  443  | 
|  Operating System package manager endpoints  |  Package repository endpoints are OS-specific and might vary by geographic region.  |  HTTPS  |  443  | 

**Note**  
 1 Access to the AWS SSM endpoints are only required if you are using AWS SSM hybrid activations for your on-premises IAM credential provider.  
 2 Access to the AWS IAM endpoints are only required if you are using AWS IAM Roles Anywhere for your on-premises IAM credential provider.

### Access required for ongoing cluster operations
<a name="hybrid-nodes-networking-access-reqs-ongoing"></a>

The following network access for your on-premises firewall is required for ongoing cluster operations.

**Important**  
Depending on your choice of CNI, you need to configure additional network access rules for the CNI ports. See the [Cilium documentation](https://docs.cilium.io/en/stable/operations/system_requirements/#firewall-rules) and the [Calico documentation](https://docs.tigera.io/calico/latest/getting-started/kubernetes/requirements#network-requirements) for details.


| Type | Protocol | Direction | Port | Source | Destination | Usage | 
| --- | --- | --- | --- | --- | --- | --- | 
|  HTTPS  |  TCP  |  Outbound  |  443  |  Remote Node CIDR(s)  |  EKS cluster IPs 1   |  kubelet to Kubernetes API server  | 
|  HTTPS  |  TCP  |  Outbound  |  443  |  Remote Pod CIDR(s)  |  EKS cluster IPs 1   |  Pod to Kubernetes API server  | 
|  HTTPS  |  TCP  |  Outbound  |  443  |  Remote Node CIDR(s)  |   [SSM service endpoint](https://docs.aws.amazon.com/general/latest/gr/ssm.html)   |  SSM hybrid activations credential refresh and SSM heartbeats every 5 minutes  | 
|  HTTPS  |  TCP  |  Outbound  |  443  |  Remote Node CIDR(s)  |   [IAM Anywhere service endpoint](https://docs.aws.amazon.com/general/latest/gr/rolesanywhere.html)   |  IAM Roles Anywhere credential refresh  | 
|  HTTPS  |  TCP  |  Outbound  |  443  |  Remote Pod CIDR(s)  |   [STS Regional Endpoint](https://docs.aws.amazon.com/general/latest/gr/sts.html)   |  Pod to STS endpoint, only required for IRSA  | 
|  HTTPS  |  TCP  |  Outbound  |  443  |  Remote Node CIDR(s)  |   [Amazon EKS Auth service endpoint](https://docs.aws.amazon.com/general/latest/gr/eks.html)   |  Node to Amazon EKS Auth endpoint, only required for Amazon EKS Pod Identity  | 
|  HTTPS  |  TCP  |  Inbound  |  10250  |  EKS cluster IPs 1   |  Remote Node CIDR(s)  |  Kubernetes API server to kubelet  | 
|  HTTPS  |  TCP  |  Inbound  |  Webhook ports  |  EKS cluster IPs 1   |  Remote Pod CIDR(s)  |  Kubernetes API server to webhooks  | 
|  HTTPS  |  TCP,UDP  |  Inbound,Outbound  |  53  |  Remote Pod CIDR(s)  |  Remote Pod CIDR(s)  |  Pod to CoreDNS. If you run at least 1 replica of CoreDNS in the cloud, you must allow DNS traffic to the VPC where CoreDNS is running.  | 
|  User-defined  |  User-defined  |  Inbound,Outbound  |  App ports  |  Remote Pod CIDR(s)  |  Remote Pod CIDR(s)  |  Pod to Pod  | 

**Note**  
 1 The IPs of the EKS cluster. See the following section on Amazon EKS elastic network interfaces.

### Amazon EKS network interfaces
<a name="hybrid-nodes-networking-eks-network-interfaces"></a>

Amazon EKS attaches network interfaces to the subnets in the VPC you pass during cluster creation to enable the communication between the EKS control plane and your VPC. The network interfaces that Amazon EKS creates can be found after cluster creation in the Amazon EC2 console or with the AWS CLI. The original network interfaces are deleted and new network interfaces are created when changes are applied on your EKS cluster, such as Kubernetes version upgrades. You can restrict the IP range for the Amazon EKS network interfaces by using constrained subnet sizes for the subnets you pass during cluster creation, which makes it easier to configure your on-premises firewall to allow inbound/outbound connectivity to this known, constrained set of IPs. To control which subnets network interfaces are created in, you can limit the number of subnets you specify when you create a cluster or you can update the subnets after creating the cluster.

The network interfaces provisioned by Amazon EKS have a description of the format `Amazon EKS your-cluster-name `. See the example below for an AWS CLI command you can use to find the IP addresses of the network interfaces that Amazon EKS provisions. Replace `VPC_ID` with the ID of the VPC you pass during cluster creation.

```
aws ec2 describe-network-interfaces \
--query 'NetworkInterfaces[?(VpcId == VPC_ID && contains(Description,Amazon EKS))].PrivateIpAddress'
```

## AWS VPC and subnet setup
<a name="hybrid-nodes-networking-vpc"></a>

The existing [VPC and subnet requirements](network-reqs.md) for Amazon EKS apply to clusters with hybrid nodes. Additionally, your VPC CIDR can’t overlap with your on-premises node and pod CIDRs. You must configure routes in your VPC routing table for your on-premises node and optionally pod CIDRs. These routes must be setup to route traffic to the gateway you are using for your hybrid network connectivity, which is commonly a virtual private gateway (VGW) or transit gateway (TGW). If you are using TGW or VGW to connect your VPC with your on-premises environment, you must create a TGW or VGW attachment for your VPC. Your VPC must have DNS hostname and DNS resolution support.

The following steps use the AWS CLI. You can also create these resources in the AWS Management Console or with other interfaces such as AWS CloudFormation, AWS CDK, or Terraform.

### Step 1: Create VPC
<a name="_step_1_create_vpc"></a>

1. Run the following command to create a VPC. Replace VPC\$1CIDR with an IPv4 CIDR range that is either RFC 1918 (private), CGNAT (RFC 6598), or non-RFC 1918/non-CGNAT (public) (for example, 10.0.0.0/16). Note: DNS resolution, which is an EKS requirement, is enabled for the VPC by default.

   ```
   aws ec2 create-vpc --cidr-block VPC_CIDR
   ```

1. Enable DNS hostnames for your VPC. Note, DNS resolution is enabled for the VPC by default. Replace `VPC_ID` with the ID of the VPC you created in the previous step.

   ```
   aws ec2 modify-vpc-attribute --vpc-id VPC_ID --enable-dns-hostnames
   ```

### Step 2: Create subnets
<a name="_step_2_create_subnets"></a>

Create at least 2 subnets. Amazon EKS uses these subnets for the cluster network interfaces. For more information, see the [Subnets requirements and considerations](network-reqs.md#network-requirements-subnets).

1. You can find the availability zones for an AWS Region with the following command. Replace `us-west-2` with your region.

   ```
   aws ec2 describe-availability-zones \
        --query 'AvailabilityZones[?(RegionName == us-west-2)].ZoneName'
   ```

1. Create a subnet. Replace `VPC_ID` with the ID of the VPC. Replace `SUBNET_CIDR` with the CIDR block for your subnet (for example 10.0.1.0/24 ). Replace `AZ` with the availability zone where the subnet will be created (for example us-west-2a). The subnets you create must be in at least 2 different availability zones.

   ```
   aws ec2 create-subnet \
       --vpc-id VPC_ID \
       --cidr-block SUBNET_CIDR \
       --availability-zone AZ
   ```

### (Optional) Step 3: Attach VPC with Amazon VPC Transit Gateway (TGW) or AWS Direct Connect virtual private gateway (VGW)
<a name="optional_step_3_attach_vpc_with_amazon_vpc_transit_gateway_tgw_or_shared_aws_direct_connect_virtual_private_gateway_vgw"></a>

If you are using a TGW or VGW, attach your VPC to the TGW or VGW. For more information, see [Amazon VPC attachments in Amazon VPC Transit Gateways](https://docs.aws.amazon.com/vpc/latest/tgw/tgw-vpc-attachments.html) or [AWS Direct Connect virtual private gateway associations](https://docs.aws.amazon.com/vpn/latest/s2svpn/how_it_works.html#VPNGateway).

 **Transit Gateway** 

Run the following command to attach a Transit Gateway. Replace `VPC_ID` with the ID of the VPC. Replace `SUBNET_ID1` and `SUBNET_ID2` with the IDs of the subnets you created in the previous step. Replace `TGW_ID` with the ID of your TGW.

```
aws ec2 create-transit-gateway-vpc-attachment \
    --vpc-id VPC_ID \
    --subnet-ids SUBNET_ID1 SUBNET_ID2 \
    --transit-gateway-id TGW_ID
```

 **Virtual Private Gateway** 

Run the following command to attach a Transit Gateway. Replace `VPN_ID` with the ID of your VGW. Replace `VPC_ID` with the ID of the VPC.

```
aws ec2 attach-vpn-gateway \
    --vpn-gateway-id VPN_ID \
    --vpc-id VPC_ID
```

### (Optional) Step 4: Create route table
<a name="_optional_step_4_create_route_table"></a>

You can modify the main route table for the VPC or you can create a custom route table. The following steps create a custom route table with the routes to on-premises node and pod CIDRs. For more information, see [Subnet route tables](https://docs.aws.amazon.com/vpc/latest/userguide/subnet-route-tables.html). Replace `VPC_ID` with the ID of the VPC.

```
aws ec2 create-route-table --vpc-id VPC_ID
```

### Step 5: Create routes for on-premises nodes and pods
<a name="_step_5_create_routes_for_on_premises_nodes_and_pods"></a>

Create routes in the route table for each of your on-premises remote nodes. You can modify the main route table for the VPC or use the custom route table you created in the previous step.

The examples below show how to create routes for your on-premises node and pod CIDRs. In the examples, a transit gateway (TGW) is used to connect the VPC with the on-premises environment. If you have multiple on-premises node and pods CIDRs, repeat the steps for each CIDR.
+ If you are using an internet gateway or a virtual private gateway (VGW) replace `--transit-gateway-id` with `--gateway-id`.
+ Replace `RT_ID` with the ID of the route table you created in the previous step.
+ Replace `REMOTE_NODE_CIDR` with the CIDR range you will use for your hybrid nodes.
+ Replace `REMOTE_POD_CIDR` with the CIDR range you will use for the pods running on hybrid nodes. The pod CIDR range corresponds to the Container Networking Interface (CNI) configuration, which most commonly uses an overlay network on-premises. For more information, see [Configure CNI for hybrid nodes](hybrid-nodes-cni.md).
+ Replace `TGW_ID` with the ID of your TGW.

 **Remote node network** 

```
aws ec2 create-route \
    --route-table-id RT_ID \
    --destination-cidr-block REMOTE_NODE_CIDR \
    --transit-gateway-id TGW_ID
```

 **Remote Pod network** 

```
aws ec2 create-route \
    --route-table-id RT_ID \
    --destination-cidr-block REMOTE_POD_CIDR \
    --transit-gateway-id TGW_ID
```

### (Optional) Step 6: Associate subnets with route table
<a name="_optional_step_6_associate_subnets_with_route_table"></a>

If you created a custom route table in the previous step, associate each of the subnets you created in the previous step with your custom route table. If you are modifying the VPC main route table, the subnets are automatically associated with the main route table of the VPC and you can skip this step.

Run the following command for each of the subnets you created in the previous steps. Replace `RT_ID` with the route table you created in the previous step. Replace `SUBNET_ID` with the ID of a subnet.

```
aws ec2 associate-route-table --route-table-id RT_ID --subnet-id SUBNET_ID
```

## Cluster security group configuration
<a name="hybrid-nodes-networking-cluster-sg"></a>

The following access for your EKS cluster security group is required for ongoing cluster operations. Amazon EKS automatically creates the required **inbound** security group rules for hybrid nodes when you create or update your cluster with remote node and pod networks configured. Because security groups allow all **outbound** traffic by default, Amazon EKS doesn’t automatically modify the **outbound** rules of the cluster security group for hybrid nodes. If you want to customize the cluster security group, you can limit traffic to the rules in the following table.


| Type | Protocol | Direction | Port | Source | Destination | Usage | 
| --- | --- | --- | --- | --- | --- | --- | 
|  HTTPS  |  TCP  |  Inbound  |  443  |  Remote Node CIDR(s)  |  N/A  |  Kubelet to Kubernetes API server  | 
|  HTTPS  |  TCP  |  Inbound  |  443  |  Remote Pod CIDR(s)  |  N/A  |  Pods requiring access to K8s API server when the CNI is not using NAT for the pod traffic.  | 
|  HTTPS  |  TCP  |  Outbound  |  10250  |  N/A  |  Remote Node CIDR(s)  |  Kubernetes API server to Kubelet  | 
|  HTTPS  |  TCP  |  Outbound  |  Webhook ports  |  N/A  |  Remote Pod CIDR(s)  |  Kubernetes API server to webhook (if running webhooks on hybrid nodes)  | 

**Important**  
 **Security group rule limits**: Amazon EC2 security groups have a maximum of 60 inbound rules by default. The security group inbound rules may not apply if your cluster security group approaches this limit. In this case, it may be required to manually add in the missing inbound rules.  
 **CIDR cleanup responsibility**: If you remove remote node or pod networks from EKS clusters, EKS does not automatically remove the corresponding security group rules. You are responsible for manually removing unused remote node or pod networks from your security group rules.

For more information about the cluster security group that Amazon EKS creates, see [View Amazon EKS security group requirements for clusters](sec-group-reqs.md).

### (Optional) Manual security group configuration
<a name="_optional_manual_security_group_configuration"></a>

If you need to create additional security groups or modify the automatically created rules, you can use the following commands as reference. By default, the command below creates a security group that allows all outbound access. You can restrict outbound access to include only the rules above. If you’re considering limiting the outbound rules, we recommend that you thoroughly test all of your applications and pod connectivity before you apply your changed rules to a production cluster.
+ In the first command, replace `SG_NAME` with a name for your security group
+ In the first command, replace `VPC_ID` with the ID of the VPC you created in the previous step
+ In the second command, replace `SG_ID` with the ID of the security group you create in the first command
+ In the second command, replace `REMOTE_NODE_CIDR` and `REMOTE_POD_CIDR` with the values for your hybrid nodes and on-premises network.

```
aws ec2 create-security-group \
    --group-name SG_NAME \
    --description "security group for hybrid nodes" \
    --vpc-id VPC_ID
```

```
aws ec2 authorize-security-group-ingress \
    --group-id SG_ID \
    --ip-permissions '[{"IpProtocol": "tcp", "FromPort": 443, "ToPort": 443, "IpRanges": [{"CidrIp": "REMOTE_NODE_CIDR"}, {"CidrIp": "REMOTE_POD_CIDR"}]}]'
```

# Prepare operating system for hybrid nodes
<a name="hybrid-nodes-os"></a>

Bottlerocket, Amazon Linux 2023 (AL2023), Ubuntu, and RHEL are validated on an ongoing basis for use as the node operating system for hybrid nodes. Bottlerocket is supported by AWSin VMware vSphere environments only. AL2023 is not covered by AWS Support Plans when run outside of Amazon EC2. AL2023 can only be used in on-premises virtualized environments, see the [Amazon Linux 2023 User Guide](https://docs.aws.amazon.com/linux/al2023/ug/outside-ec2.html) for more information. AWS supports the hybrid nodes integration with Ubuntu and RHEL operating systems but does not provide support for the operating system itself.

You are responsible for operating system provisioning and management. When you are testing hybrid nodes for the first time, it is easiest to run the Amazon EKS Hybrid Nodes CLI (`nodeadm`) on an already provisioned host. For production deployments, we recommend that you include `nodeadm` in your operating system images with it configured to run as a systemd service to automatically join hosts to Amazon EKS clusters at host startup. If you are using Bottlerocket as your node operating system on vSphere, you do not need to use `nodeadm` as Bottlerocket already contains the dependencies required for hybrid nodes and will automatically connect to the cluster you configure upon host startup.

## Version compatibility
<a name="_version_compatibility"></a>

The table below represents the operating system versions that are compatible and validated to use as the node operating system for hybrid nodes. If you are using other operating system variants or versions that are not included in this table, then the compatibility of hybrid nodes with your operating system variant or version is not covered by AWS Support. Hybrid nodes are agnostic to the underlying infrastructure and support x86 and ARM architectures.


| Operating System | Versions | 
| --- | --- | 
|  Amazon Linux  |  Amazon Linux 2023 (AL2023)  | 
|  Bottlerocket  |  v1.37.0 and above VMware variants running Kubernetes v1.28 and above  | 
|  Ubuntu  |  Ubuntu 20.04, Ubuntu 22.04, Ubuntu 24.04  | 
|  Red Hat Enterprise Linux  |  RHEL 8, RHEL 9  | 

## Operating system considerations
<a name="_operating_system_considerations"></a>

### General
<a name="_general"></a>
+ The Amazon EKS Hybrid Nodes CLI (`nodeadm`) can be used to simplify the installation and configuration of the hybrid nodes components and dependencies. You can run the `nodeadm install` process during your operating system image build pipelines or at runtime on each on-premises host. For more information on the components that `nodeadm` installs, see the [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md).
+ If you are using a proxy in your on-premises environment to reach the internet, there is additional operating system configuration required for the install and upgrade processes to configure your package manager to use the proxy. See [Configure proxy for hybrid nodes](hybrid-nodes-proxy.md) for instructions.

### Bottlerocket
<a name="_bottlerocket"></a>
+ The steps and tools to connect a Bottlerocket node are different than the steps for other operating systems and are covered separately in [Connect hybrid nodes with Bottlerocket](hybrid-nodes-bottlerocket.md), instead of the steps in [Connect hybrid nodes](hybrid-nodes-join.md).
+ The steps for Bottlerocket don’t use the hybrid nodes CLI tool, `nodeadm`.
+ Only VMware variants of Bottlerocket version v1.37.0 and above are supported with EKS Hybrid Nodes. VMware variants of Bottlerocket are available for Kubernetes versions v1.28 and above. [Other Bottlerocket variants](https://bottlerocket.dev/en/os/1.36.x/concepts/variants) are not supported as the hybrid nodes operating system. NOTE: VMware variants of Bottlerocket are only available for the x86\$164 architecture.

### Containerd
<a name="_containerd"></a>
+ Containerd is the standard Kubernetes container runtime and is a dependency for hybrid nodes, as well as all Amazon EKS node compute types. The Amazon EKS Hybrid Nodes CLI (`nodeadm`) attempts to install containerd during the `nodeadm install` process. You can configure the containerd installation at `nodeadm install` runtime with the `--containerd-source` command line option. Valid options are `none`, `distro`, and `docker`. If you are using RHEL, `distro` is not a valid option and you can either configure `nodeadm` to install the containerd build from Docker’s repos or you can manually install containerd. When using AL2023 or Ubuntu, `nodeadm` defaults to installing containerd from the operating system distribution. If you do not want nodeadm to install containerd, use the `--containerd-source none` option.

### Ubuntu
<a name="_ubuntu"></a>
+ If you are using Ubuntu 24.04, you may need to update your version of containerd or change your AppArmor configuration to adopt a fix that allows pods to properly terminate, see [Ubuntu \$12065423](https://bugs.launchpad.net/ubuntu/+source/containerd-app/\+bug/2065423). A reboot is required to apply changes to the AppArmor profile. The latest version of Ubuntu 24.04 has an updated containerd version in its package manager with the fix (containerd version 1.7.19\$1).

### ARM
<a name="_arm"></a>
+ If you are using ARM hardware, an ARMv8.2 compliant processor with the Cryptography Extension (ARMv8.2\$1crypto) is required to run version 1.31 and above of the EKS kube-proxy add-on. All Raspberry Pi systems prior to the Raspberry Pi 5, as well as Cortex-A72 based processors, do not meet this requirement. As a workaround, you can continue to use version 1.30 of the EKS kube-proxy add-on until it reaches end of extended support in July of 2026, see [Kubernetes release calendar](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html), or use a custom kube-proxy image from upstream.
+ The following error message in the kube-proxy log indicates this incompatibility:

```
Fatal glibc error: This version of Amazon Linux requires a newer ARM64 processor compliant with at least ARM architecture 8.2-a with Cryptographic extensions. On EC2 this is Graviton 2 or later.
```

## Building operating system images
<a name="_building_operating_system_images"></a>

Amazon EKS provides [example Packer templates](https://github.com/aws/eks-hybrid/tree/main/example/packer) you can use to create operating system images that include `nodeadm` and configure it to run at host-startup. This process is recommended to avoid pulling the hybrid nodes dependencies individually on each host and to automate the hybrid nodes bootstrap process. You can use the example Packer templates with an Ubuntu 22.04, Ubuntu 24.04, RHEL 8 or RHEL 9 ISO image and can output images with these formats: OVA, Qcow2, or raw.

### Prerequisites
<a name="_prerequisites"></a>

Before using the example Packer templates, you must have the following installed on the machine from where you are running Packer.
+ Packer version 1.11.0 or higher. For instructions on installing Packer, see [Install Packer](https://developer.hashicorp.com/packer/tutorials/docker-get-started/get-started-install-cli) in the Packer documentation.
+ If building OVAs, VMware vSphere plugin 1.4.0 or higher
+ If building `Qcow2` or raw images, QEMU plugin version 1.x

### Set Environment Variables
<a name="_set_environment_variables"></a>

Before running the Packer build, set the following environment variables on the machine from where you are running Packer.

 **General** 

The following environment variables must be set for building images with all operating systems and output formats.


| Environment Variable | Type | Description | 
| --- | --- | --- | 
|  PKR\$1SSH\$1PASSWORD  |  String  |  Packer uses the `ssh_username` and `ssh_password` variables to SSH into the created machine when provisioning. This needs to match the passwords used to create the initial user within the respective OS’s kickstart or user-data files. The default is set as "builder" or "ubuntu" depending on the OS. When setting your password, make sure to change it within the corresponding `ks.cfg` or `user-data` file to match.  | 
|  ISO\$1URL  |  String  |  URL of the ISO to use. Can be a web link to download from a server, or an absolute path to a local file  | 
|  ISO\$1CHECKSUM  |  String  |  Associated checksum for the supplied ISO.  | 
|  CREDENTIAL\$1PROVIDER  |  String  |  Credential provider for hybrid nodes. Valid values are `ssm` (default) for SSM hybrid activations and `iam` for IAM Roles Anywhere  | 
|  K8S\$1VERSION  |  String  |  Kubernetes version for hybrid nodes (for example `1.31`). For supported Kubernetes versions, see [Amazon EKS supported versions](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html).  | 
|  NODEADM\$1ARCH  |  String  |  Architecture for `nodeadm install`. Select `amd` or `arm`.  | 

 **RHEL** 

If you are using RHEL, the following environment variables must be set.


| Environment Variable | Type | Description | 
| --- | --- | --- | 
|  RH\$1USERNAME  |  String  |  RHEL subscription manager username  | 
|  RH\$1PASSWORD  |  String  |  RHEL subscription manager password  | 
|  RHEL\$1VERSION  |  String  |  Rhel iso version being used. Valid values are `8` or `9`.  | 

 **Ubuntu** 

There are no Ubuntu-specific environment variables required.

 **vSphere** 

If you are building a VMware vSphere OVA, the following environment variables must be set.


| Environment Variable | Type | Description | 
| --- | --- | --- | 
|  VSPHERE\$1SERVER  |  String  |  vSphere server address  | 
|  VSPHERE\$1USER  |  String  |  vSphere username  | 
|  VSPHERE\$1PASSWORD  |  String  |  vSphere password  | 
|  VSPHERE\$1DATACENTER  |  String  |  vSphere datacenter name  | 
|  VSPHERE\$1CLUSTER  |  String  |  vSphere cluster name  | 
|  VSPHERE\$1DATASTORE  |  String  |  vSphere datastore name  | 
|  VSPHERE\$1NETWORK  |  String  |  vSphere network name  | 
|  VSPHERE\$1OUTPUT\$1FOLDER  |  String  |  vSphere output folder for the templates  | 

 **QEMU** 


| Environment Variable | Type | Description | 
| --- | --- | --- | 
|  PACKER\$1OUTPUT\$1FORMAT  |  String  |  Output format for the QEMU builder. Valid values are `qcow2` and `raw`.  | 

 **Validate template** 

Before running your build, validate your template with the following command after setting your environment variables. Replace `template.pkr.hcl` if you are using a different name for your template.

```
packer validate template.pkr.hcl
```

### Build images
<a name="_build_images"></a>

Build your images with the following commands and use the `-only` flag to specify the target and operating system for your images. Replace `template.pkr.hcl` if you are using a different name for your template.

 **vSphere OVAs** 

**Note**  
If you are using RHEL with vSphere you need to convert the kickstart files to an OEMDRV image and pass it as an ISO to boot from. For more information, see the [Packer Readme](https://github.com/aws/eks-hybrid/tree/main/example/packer#utilizing-rhel-with-vsphere) in the EKS Hybrid Nodes GitHub Repository.

 **Ubuntu 22.04 OVA** 

```
packer build -only=general-build.vsphere-iso.ubuntu22 template.pkr.hcl
```

 **Ubuntu 24.04 OVA** 

```
packer build -only=general-build.vsphere-iso.ubuntu24 template.pkr.hcl
```

 **RHEL 8 OVA** 

```
packer build -only=general-build.vsphere-iso.rhel8 template.pkr.hcl
```

 **RHEL 9 OVA** 

```
packer build -only=general-build.vsphere-iso.rhel9 template.pkr.hcl
```

 **QEMU** 

**Note**  
If you are building an image for a specific host CPU that does not match your builder host, see the [QEMU](https://www.qemu.org/docs/master/system/qemu-cpu-models.html) documentation for the name that matches your host CPU and use the `-cpu` flag with the name of the host CPU when you run the following commands.

 **Ubuntu 22.04 Qcow2 / Raw** 

```
packer build -only=general-build.qemu.ubuntu22 template.pkr.hcl
```

 **Ubuntu 24.04 Qcow2 / Raw** 

```
packer build -only=general-build.qemu.ubuntu24 template.pkr.hcl
```

 **RHEL 8 Qcow2 / Raw** 

```
packer build -only=general-build.qemu.rhel8 template.pkr.hcl
```

 **RHEL 9 Qcow2 / Raw** 

```
packer build -only=general-build.qemu.rhel9 template.pkr.hcl
```

### Pass nodeadm configuration through user-data
<a name="_pass_nodeadm_configuration_through_user_data"></a>

You can pass configuration for `nodeadm` in your user-data through cloud-init to configure and automatically connect hybrid nodes to your EKS cluster at host startup. Below is an example for how to accomplish this when using VMware vSphere as the infrastructure for your hybrid nodes.

1. Install the the `govc` CLI following the instructions in the [govc readme](https://github.com/vmware/govmomi/blob/main/govc/README.md) on GitHub.

1. After running the Packer build in the previous section and provisioning your template, you can clone your template to create multiple different nodes using the following. You must clone the template for each new VM you are creating that will be used for hybrid nodes. Replace the variables in the command below with the values for your environment. The `VM_NAME` in the command below is used as your `NODE_NAME` when you inject the names for your VMs via your `metadata.yaml` file.

   ```
   govc vm.clone -vm "/PATH/TO/TEMPLATE" -ds="YOUR_DATASTORE" \
       -on=false -template=false -folder=/FOLDER/TO/SAVE/VM "VM_NAME"
   ```

1. After cloning the template for each of your new VMs, create a `userdata.yaml` and `metadata.yaml` for your VMs. Your VMs can share the same `userdata.yaml` and `metadata.yaml` and you will populate these on a per VM basis in the steps below. The `nodeadm` configuration is created and defined in the `write_files` section of your `userdata.yaml`. The example below uses AWS SSM hybrid activations as the on-premises credential provider for hybrid nodes. For more information on `nodeadm` configuration, see the [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md).

    **userdata.yaml:** 

   ```
   #cloud-config
   users:
     - name: # username for login. Use 'builder' for RHEL or 'ubuntu' for Ubuntu.
       passwd: # password to login. Default is 'builder' for RHEL.
       groups: [adm, cdrom, dip, plugdev, lxd, sudo]
       lock-passwd: false
       sudo: ALL=(ALL) NOPASSWD:ALL
       shell: /bin/bash
   
   write_files:
     - path: /usr/local/bin/nodeConfig.yaml
       permissions: '0644'
       content: |
         apiVersion: node.eks.aws/v1alpha1
         kind: NodeConfig
         spec:
             cluster:
                 name: # Cluster Name
                 region: # AWS region
             hybrid:
                 ssm:
                     activationCode: # Your ssm activation code
                     activationId: # Your ssm activation id
   
   runcmd:
     - /usr/local/bin/nodeadm init -c file:///usr/local/bin/nodeConfig.yaml >> /var/log/nodeadm-init.log 2>&1
   ```

    **metadata.yaml:** 

   Create a `metadata.yaml` for your environment. Keep the `"$NODE_NAME"` variable format in the file as this will be populated with values in a subsequent step.

   ```
   instance-id: "$NODE_NAME"
   local-hostname: "$NODE_NAME"
   network:
     version: 2
     ethernets:
       nics:
         match:
           name: ens*
         dhcp4: yes
   ```

1. Add the `userdata.yaml` and `metadata.yaml` files as `gzip+base64` strings with the following commands. The following commands should be run for each of the VMs you are creating. Replace `VM_NAME` with the name of the VM you are updating.

   ```
   export NODE_NAME="VM_NAME"
   export USER_DATA=$(gzip -c9 <userdata.yaml | base64)
   
   govc vm.change -dc="YOUR_DATASTORE" -vm "$NODE_NAME" -e guestinfo.userdata="${USER_DATA}"
   govc vm.change -dc="YOUR_DATASTORE" -vm "$NODE_NAME" -e guestinfo.userdata.encoding=gzip+base64
   
   envsubst '$NODE_NAME' < metadata.yaml > metadata.yaml.tmp
   export METADATA=$(gzip -c9 <metadata.yaml.tmp | base64)
   
   govc vm.change -dc="YOUR_DATASTORE" -vm "$NODE_NAME" -e guestinfo.metadata="${METADATA}"
   govc vm.change -dc="YOUR_DATASTORE" -vm "$NODE_NAME" -e guestinfo.metadata.encoding=gzip+base64
   ```

1. Power on your new VMs, which should automatically connect to the EKS cluster you configured.

   ```
   govc vm.power -on "${NODE_NAME}"
   ```

# Prepare credentials for hybrid nodes
<a name="hybrid-nodes-creds"></a>

Amazon EKS Hybrid Nodes use temporary IAM credentials provisioned by AWS SSM hybrid activations or AWS IAM Roles Anywhere to authenticate with the Amazon EKS cluster. You must use either AWS SSM hybrid activations or AWS IAM Roles Anywhere with the Amazon EKS Hybrid Nodes CLI (`nodeadm`). You should not use both AWS SSM hybrid activations and AWS IAM Roles Anywhere. We recommend that you use AWS SSM hybrid activations if you do not have existing Public Key Infrastructure (PKI) with a Certificate Authority (CA) and certificates for your on-premises environments. If you do have existing PKI and certificates on-premises, use AWS IAM Roles Anywhere.

## Hybrid Nodes IAM Role
<a name="hybrid-nodes-role"></a>

Before you can connect hybrid nodes to your Amazon EKS cluster, you must create an IAM role that will be used with AWS SSM hybrid activations or AWS IAM Roles Anywhere for your hybrid nodes credentials. After cluster creation, you will use this role with an Amazon EKS access entry or `aws-auth` ConfigMap entry to map the IAM role to Kubernetes Role-Based Access Control (RBAC). For more information on associating the Hybrid Nodes IAM role with Kubernetes RBAC, see [Prepare cluster access for hybrid nodes](hybrid-nodes-cluster-prep.md).

The Hybrid Nodes IAM role must have the following permissions.
+ Permissions for `nodeadm` to use the `eks:DescribeCluster` action to gather information about the cluster to which you want to connect hybrid nodes. If you do not enable the `eks:DescribeCluster` action, then you must pass your Kubernetes API endpoint, cluster CA bundle, and service IPv4 CIDR in the node configuration you pass to the `nodeadm init` command.
+ Permissions for `nodeadm` to use the `eks:ListAccessEntries` action to list the access entries on the cluster to which you want to connect hybrid nodes. If you do not enable the `eks:ListAccessEntries` action, then you must pass the `--skip cluster-access-validation` flag when you run the `nodeadm init` command.
+ Permissions for the kubelet to use container images from Amazon Elastic Container Registry (Amazon ECR) as defined in the [AmazonEC2ContainerRegistryPullOnly](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonEC2ContainerRegistryPullOnly.html) policy.
+ If using AWS SSM, permissions for `nodeadm init` to use AWS SSM hybrid activations as defined in the [https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSSMManagedInstanceCore.html](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSSMManagedInstanceCore.html) policy.
+ If using AWS SSM, permissions to use the `ssm:DeregisterManagedInstance` action and `ssm:DescribeInstanceInformation` action for `nodeadm uninstall` to deregister instances.
+ (Optional) Permissions for the Amazon EKS Pod Identity Agent to use the `eks-auth:AssumeRoleForPodIdentity` action to retrieve credentials for pods.

## Setup AWS SSM hybrid activations
<a name="hybrid-nodes-ssm"></a>

Before setting up AWS SSM hybrid activations, you must have a Hybrid Nodes IAM role created and configured. For more information, see [Create the Hybrid Nodes IAM role](#hybrid-nodes-create-role). Follow the instructions at [Create a hybrid activation to register nodes with Systems Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/hybrid-activation-managed-nodes.html) in the AWS Systems Manager User Guide to create an AWS SSM hybrid activation for your hybrid nodes. The Activation Code and ID you receive is used with `nodeadm` when you register your hosts as hybrid nodes with your Amazon EKS cluster. You can come back to this step at a later point after you have created and prepared your Amazon EKS clusters for hybrid nodes.

**Important**  
Systems Manager immediately returns the Activation Code and ID to the console or the command window, depending on how you created the activation. Copy this information and store it in a safe place. If you navigate away from the console or close the command window, you might lose this information. If you lose it, you must create a new activation.

By default, AWS SSM hybrid activations are active for 24 hours. You can alternatively specify an `--expiration-date` when you create your hybrid activation in timestamp format, such as `2024-08-01T00:00:00`. When you use AWS SSM as your credential provider, the node name for your hybrid nodes is not configurable, and is auto-generated by AWS SSM. You can view and manage the AWS SSM Managed Instances in the AWS Systems Manager console under Fleet Manager. You can register up to 1,000 standard [hybrid-activated nodes](https://docs.aws.amazon.com/systems-manager/latest/userguide/activations.html) per account per AWS Region at no additional cost. However, registering more than 1,000 hybrid nodes requires that you activate the advanced-instances tier. There is a charge to use the advanced-instances tier that is not included in the [Amazon EKS Hybrid Nodes pricing](https://aws.amazon.com/eks/pricing/). For more information, see [AWS Systems Manager Pricing](https://aws.amazon.com/systems-manager/pricing/).

See the example below for how to create an AWS SSM hybrid activation with your Hybrid Nodes IAM role. When you use AWS SSM hybrid activations for your hybrid nodes credentials, the names of your hybrid nodes will have the format `mi-012345678abcdefgh` and the temporary credentials provisioned by AWS SSM are valid for 1 hour. You cannot alter the node name or credential duration when using AWS SSM as your credential provider. The temporary credentials are automatically rotated by AWS SSM and the rotation does not impact the status of your nodes or applications.

We recommend that you use one AWS SSM hybrid activation per EKS cluster to scope the AWS SSM `ssm:DeregisterManagedInstance` permission of the Hybrid Nodes IAM role to only be able to deregister instances that are associated with your AWS SSM hybrid activation. In the example on this page, a tag with the EKS cluster ARN is used, which can be used to map your AWS SSM hybrid activation to the EKS cluster. You can alternatively use your preferred tag and method of scoping the AWS SSM permissions based on your permission boundaries and requirements. The `REGISTRATION_LIMIT` option in the command below is an integer used to limit the number of machines that can use the AWS SSM hybrid activation (for example `10`)

```
aws ssm create-activation \
     --region AWS_REGION \
     --default-instance-name eks-hybrid-nodes \
     --description "Activation for EKS hybrid nodes" \
     --iam-role AmazonEKSHybridNodesRole \
     --tags Key=EKSClusterARN,Value=arn:aws:eks:AWS_REGION:AWS_ACCOUNT_ID:cluster/CLUSTER_NAME \
     --registration-limit REGISTRATION_LIMIT
```

Review the instructions on [Create a hybrid activation to register nodes with Systems Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide/hybrid-activation-managed-nodes.html) for more information about the available configuration settings for AWS SSM hybrid activations.

## Setup AWS IAM Roles Anywhere
<a name="hybrid-nodes-iam-roles-anywhere"></a>

Follow the instructions at [Getting started with IAM Roles Anywhere](https://docs.aws.amazon.com/rolesanywhere/latest/userguide/getting-started.html) in the IAM Roles Anywhere User Guide to set up the trust anchor and profile you will use for temporary IAM credentials for your Hybrid Nodes IAM role. When you create your profile, you can create it without adding any roles. You can create this profile, return to these steps to create your Hybrid Nodes IAM role, and then add your role to your profile after it is created. You can alternatively use the AWS CloudFormation steps later on this page to complete your IAM Roles Anywhere setup for hybrid nodes.

When you add the Hybrid Nodes IAM role to your profile, select **Accept custom role session name** in the **Custom role** session name panel at the bottom of the **Edit profile** page in the AWS IAM Roles Anywhere console. This corresponds to the [acceptRoleSessionName](https://docs.aws.amazon.com/rolesanywhere/latest/APIReference/API_CreateProfile.html#rolesanywhere-CreateProfile-request-acceptRoleSessionName) field of the `CreateProfile` API. This allows you to supply a custom node name for your hybrid nodes in the configuration you pass to `nodeadm` during the bootstrap process. Passing a custom node name during the `nodeadm init` process is required. You can update your profile to accept a custom role session name after creating your profile.

You can configure the credential validity duration with AWS IAM Roles Anywhere through the [durationSeconds](https://docs.aws.amazon.com/rolesanywhere/latest/userguide/authentication-create-session#credentials-object) field of your AWS IAM Roles Anywhere profile. The default duration is 1 hour with a maximum of 12 hours. The `MaxSessionDuration` setting on your Hybrid Nodes IAM role must be greater than the `durationSeconds` setting on your AWS IAM Roles Anywhere profile. For more information on `MaxSessionDuration`, see [UpdateRole API documentation](https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_UpdateRole.html).

The per-machine certificates and keys you generate from your certificate authority (CA) must be placed in the `/etc/iam/pki` directory on each hybrid node with the file names `server.pem` for the certificate and `server.key` for the key.

## Create the Hybrid Nodes IAM role
<a name="hybrid-nodes-create-role"></a>

To run the steps in this section, the IAM principal using the AWS console or AWS CLI must have the following permissions.
+  `iam:CreatePolicy` 
+  `iam:CreateRole` 
+  `iam:AttachRolePolicy` 
+ If using AWS IAM Roles Anywhere
  +  `rolesanywhere:CreateTrustAnchor` 
  +  `rolesanywhere:CreateProfile` 
  +  `iam:PassRole` 

### AWS CloudFormation
<a name="hybrid-nodes-creds-cloudformation"></a>

Install and configure the AWS CLI, if you haven’t already. See [Installing or updating to the last version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

 **Steps for AWS SSM hybrid activations** 

The CloudFormation stack creates the Hybrid Nodes IAM Role with the permissions outlined above. The CloudFormation template does not create the AWS SSM hybrid activation.

1. Download the AWS SSM CloudFormation template for hybrid nodes:

   ```
   curl -OL 'https://raw.githubusercontent.com/aws/eks-hybrid/refs/heads/main/example/hybrid-ssm-cfn.yaml'
   ```

1. Create a `cfn-ssm-parameters.json` with the following options:

   1. Replace `ROLE_NAME` with the name for your Hybrid Nodes IAM role. By default, the CloudFormation template uses `AmazonEKSHybridNodesRole` as the name of the role it creates if you do not specify a name.

   1. Replace `TAG_KEY` with the AWS SSM resource tag key you used when creating your AWS SSM hybrid activation. The combination of the tag key and tag value is used in the condition for the `ssm:DeregisterManagedInstance` to only allow the Hybrid Nodes IAM role to deregister the AWS SSM managed instances that are associated with your AWS SSM hybrid activation. In the CloudFormation template, `TAG_KEY` defaults to `EKSClusterARN`.

   1. Replace `TAG_VALUE` with the AWS SSM resource tag value you used when creating your AWS SSM hybrid activation. The combination of the tag key and tag value is used in the condition for the `ssm:DeregisterManagedInstance` to only allow the Hybrid Nodes IAM role to deregister the AWS SSM managed instances that are associated with your AWS SSM hybrid activation. If you are using the default `TAG_KEY` of `EKSClusterARN`, then pass your EKS cluster ARN as the `TAG_VALUE`. EKS cluster ARNs have the format ` arn:aws:eks:AWS_REGION:AWS_ACCOUNT_ID:cluster/CLUSTER_NAME`.

      ```
      {
        "Parameters": {
          "RoleName": "ROLE_NAME",
          "SSMDeregisterConditionTagKey": "TAG_KEY",
          "SSMDeregisterConditionTagValue": "TAG_VALUE"
        }
      }
      ```

1. Deploy the CloudFormation stack. Replace `STACK_NAME` with your name for the CloudFormation stack.

   ```
   aws cloudformation deploy \
       --stack-name STACK_NAME \
       --template-file hybrid-ssm-cfn.yaml \
       --parameter-overrides file://cfn-ssm-parameters.json \
       --capabilities CAPABILITY_NAMED_IAM
   ```

 **Steps for AWS IAM Roles Anywhere** 

The CloudFormation stack creates the AWS IAM Roles Anywhere trust anchor with the certificate authority (CA) you configure, creates the AWS IAM Roles Anywhere profile, and creates the Hybrid Nodes IAM role with the permissions outlined previously.

1. To set up a certificate authority (CA)

   1. To use an AWS Private CA resource, open the [AWS Private Certificate Authority console](https://console.aws.amazon.com/acm-pca/home). Follow the instructions in the [AWS Private CA User Guide](https://docs.aws.amazon.com/privateca/latest/userguide/PcaWelcome.html).

   1. To use an external CA, follow the instructions provided by the CA. You provide the certificate body in a later step.

   1. Certificates issued from public CAs cannot be used as trust anchors.

1. Download the AWS IAM Roles Anywhere CloudFormation template for hybrid nodes

   ```
   curl -OL 'https://raw.githubusercontent.com/aws/eks-hybrid/refs/heads/main/example/hybrid-ira-cfn.yaml'
   ```

1. Create a `cfn-iamra-parameters.json` with the following options:

   1. Replace `ROLE_NAME` with the name for your Hybrid Nodes IAM role. By default, the CloudFormation template uses `AmazonEKSHybridNodesRole` as the name of the role it creates if you do not specify a name.

   1. Replace `CERT_ATTRIBUTE` with the per-machine certificate attribute that uniquely identifies your host. The certificate attribute you use must match the nodeName you use for the `nodeadm` configuration when you connect hybrid nodes to your cluster. For more information, see the [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md). By default, the CloudFormation template uses `${aws:PrincipalTag/x509Subject/CN}` as the `CERT_ATTRIBUTE`, which corresponds to the CN field of your per-machine certificates. You can alternatively pass `$(aws:PrincipalTag/x509SAN/Name/CN}` as your `CERT_ATTRIBUTE`.

   1. Replace `CA_CERT_BODY` with the certificate body of your CA without line breaks. The `CA_CERT_BODY` must be in Privacy Enhanced Mail (PEM) format. If you have a CA certificate in PEM format, remove the line breaks and BEGIN CERTIFICATE and END CERTIFICATE lines before placing the CA certificate body in your `cfn-iamra-parameters.json` file.

      ```
      {
        "Parameters": {
          "RoleName": "ROLE_NAME",
          "CertAttributeTrustPolicy": "CERT_ATTRIBUTE",
          "CABundleCert": "CA_CERT_BODY"
        }
      }
      ```

1. Deploy the CloudFormation template. Replace `STACK_NAME` with your name for the CloudFormation stack.

   ```
   aws cloudformation deploy \
       --stack-name STACK_NAME \
       --template-file hybrid-ira-cfn.yaml \
       --parameter-overrides file://cfn-iamra-parameters.json
       --capabilities CAPABILITY_NAMED_IAM
   ```

### AWS CLI
<a name="hybrid-nodes-creds-awscli"></a>

Install and configure the AWS CLI, if you haven’t already. See [Installing or updating to the last version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

 **Create EKS Describe Cluster Policy** 

1. Create a file named `eks-describe-cluster-policy.json` with the following contents:

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "eks:DescribeCluster"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

1. Create the policy with the following command:

   ```
   aws iam create-policy \
       --policy-name EKSDescribeClusterPolicy \
       --policy-document file://eks-describe-cluster-policy.json
   ```

 **Steps for AWS SSM hybrid activations** 

1. Create a file named `eks-hybrid-ssm-policy.json` with the following contents. The policy grants permission for two actions `ssm:DescribeInstanceInformation` and `ssm:DeregisterManagedInstance`. The policy restricts the `ssm:DeregisterManagedInstance` permission to AWS SSM managed instances associated with your AWS SSM hybrid activation based on the resource tag you specify in your trust policy.

   1. Replace `AWS_REGION` with the AWS Region for your AWS SSM hybrid activation.

   1. Replace `AWS_ACCOUNT_ID` with your AWS account ID.

   1. Replace `TAG_KEY` with the AWS SSM resource tag key you used when creating your AWS SSM hybrid activation. The combination of the tag key and tag value is used in the condition for the `ssm:DeregisterManagedInstance` to only allow the Hybrid Nodes IAM role to deregister the AWS SSM managed instances that are associated with your AWS SSM hybrid activation. In the CloudFormation template, `TAG_KEY` defaults to `EKSClusterARN`.

   1. Replace `TAG_VALUE` with the AWS SSM resource tag value you used when creating your AWS SSM hybrid activation. The combination of the tag key and tag value is used in the condition for the `ssm:DeregisterManagedInstance` to only allow the Hybrid Nodes IAM role to deregister the AWS SSM managed instances that are associated with your AWS SSM hybrid activation. If you are using the default `TAG_KEY` of `EKSClusterARN`, then pass your EKS cluster ARN as the `TAG_VALUE`. EKS cluster ARNs have the format ` arn:aws:eks:AWS_REGION:AWS_ACCOUNT_ID:cluster/CLUSTER_NAME`.

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": "ssm:DescribeInstanceInformation",
                  "Resource": "*"
              },
              {
                  "Effect": "Allow",
                  "Action": "ssm:DeregisterManagedInstance",
                  "Resource": "arn:aws:ssm:us-east-1:123456789012:managed-instance/*",
                  "Condition": {
                      "StringEquals": {
                          "ssm:resourceTag/TAG_KEY": "TAG_VALUE"
                      }
                  }
              }
          ]
      }
      ```

1. Create the policy with the following command

   ```
   aws iam create-policy \
       --policy-name EKSHybridSSMPolicy \
       --policy-document file://eks-hybrid-ssm-policy.json
   ```

1. Create a file named `eks-hybrid-ssm-trust.json`. Replace `AWS_REGION` with the AWS Region of your AWS SSM hybrid activation and `AWS_ACCOUNT_ID` with your AWS account ID.

   ```
   {
      "Version":"2012-10-17",		 	 	 
      "Statement":[
         {
            "Sid":"",
            "Effect":"Allow",
            "Principal":{
               "Service":"ssm.amazonaws.com"
            },
            "Action":"sts:AssumeRole",
            "Condition":{
               "StringEquals":{
                  "aws:SourceAccount":"123456789012"
               },
               "ArnEquals":{
                  "aws:SourceArn":"arn:aws:ssm:us-east-1:123456789012:*"
               }
            }
         }
      ]
   }
   ```

1. Create the role with the following command.

   ```
   aws iam create-role \
       --role-name AmazonEKSHybridNodesRole \
       --assume-role-policy-document file://eks-hybrid-ssm-trust.json
   ```

1. Attach the `EKSDescribeClusterPolicy` and the `EKSHybridSSMPolicy` you created in the previous steps. Replace `AWS_ACCOUNT_ID` with your AWS account ID.

   ```
   aws iam attach-role-policy \
       --role-name AmazonEKSHybridNodesRole \
       --policy-arn arn:aws:iam::AWS_ACCOUNT_ID:policy/EKSDescribeClusterPolicy
   ```

   ```
   aws iam attach-role-policy \
       --role-name AmazonEKSHybridNodesRole \
       --policy-arn arn:aws:iam::AWS_ACCOUNT_ID:policy/EKSHybridSSMPolicy
   ```

1. Attach the `AmazonEC2ContainerRegistryPullOnly` and `AmazonSSMManagedInstanceCore` AWS managed policies.

   ```
   aws iam attach-role-policy \
       --role-name AmazonEKSHybridNodesRole \
       --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPullOnly
   ```

   ```
   aws iam attach-role-policy \
       --role-name AmazonEKSHybridNodesRole \
       --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
   ```

 **Steps for AWS IAM Roles Anywhere** 

To use AWS IAM Roles Anywhere, you must set up your AWS IAM Roles Anywhere trust anchor before creating the Hybrid Nodes IAM Role. See [Setup AWS IAM Roles Anywhere](#hybrid-nodes-iam-roles-anywhere) for instructions.

1. Create a file named `eks-hybrid-iamra-trust.json`. Replace `TRUST_ANCHOR ARN` with the ARN of the trust anchor you created in the [Setup AWS IAM Roles Anywhere](#hybrid-nodes-iam-roles-anywhere) steps. The condition in this trust policy restricts the ability of AWS IAM Roles Anywhere to assume the Hybrid Nodes IAM role to exchange temporary IAM credentials only when the role session name matches the CN in the x509 certificate installed on your hybrid nodes. You can alternatively use other certificate attributes to uniquely identify your node. The certificate attribute that you use in the trust policy must correspond to the `nodeName` you set in your `nodeadm` configuration. For more information, see the [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md).

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": "rolesanywhere.amazonaws.com"
               },
               "Action": [
                   "sts:TagSession",
                   "sts:SetSourceIdentity"
               ],
               "Condition": {
                   "StringEquals": {
                       "aws:PrincipalTag/x509Subject/CN": "${aws:PrincipalTag/x509Subject/CN}"
                   },
                   "ArnEquals": {
                       "aws:SourceArn": "arn:aws:rolesanywhere:us-east-1:123456789012:trust-anchor/TA_ID"
                   }
               }
           },
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": "rolesanywhere.amazonaws.com"
               },
               "Action": "sts:AssumeRole",
               "Condition": {
                   "StringEquals": {
                       "sts:RoleSessionName": "${aws:PrincipalTag/x509Subject/CN}",
                       "aws:PrincipalTag/x509Subject/CN": "${aws:PrincipalTag/x509Subject/CN}"
                   },
                   "ArnEquals": {
                       "aws:SourceArn": "arn:aws:rolesanywhere:us-east-1:123456789012:trust-anchor/TA_ID"
                   }
               }
           }
       ]
   }
   ```

1. Create the role with the following command.

   ```
   aws iam create-role \
       --role-name AmazonEKSHybridNodesRole \
       --assume-role-policy-document file://eks-hybrid-iamra-trust.json
   ```

1. Attach the `EKSDescribeClusterPolicy` you created in the previous steps. Replace `AWS_ACCOUNT_ID` with your AWS account ID.

   ```
   aws iam attach-role-policy \
       --role-name AmazonEKSHybridNodesRole \
       --policy-arn arn:aws:iam::AWS_ACCOUNT_ID:policy/EKSDescribeClusterPolicy
   ```

1. Attach the `AmazonEC2ContainerRegistryPullOnly` AWS managed policy

   ```
   aws iam attach-role-policy \
       --role-name AmazonEKSHybridNodesRole \
       --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPullOnly
   ```

### AWS Management Console
<a name="hybrid-nodes-creds-console"></a>

 **Create EKS Describe Cluster Policy** 

1. Open the [Amazon IAM console](https://console.aws.amazon.com/iam/home) 

1. In the left navigation pane, choose **Policies**.

1. On the **Policies** page, choose **Create policy**.

1. On the Specify permissions page, in the Select a service panel, choose EKS.

   1. Filter actions for **DescribeCluster** and select the **DescribeCluster** Read action.

   1. Choose **Next**.

1. On the **Review and create** page

   1. Enter a **Policy name** for your policy such as `EKSDescribeClusterPolicy`.

   1. Choose **Create policy**.

 **Steps for AWS SSM hybrid activations** 

1. Open the [Amazon IAM console](https://console.aws.amazon.com/iam/home) 

1. In the left navigation pane, choose **Policies**.

1. On the **Policies page**, choose **Create policy**.

1. On the **Specify permissions** page, in the **Policy editor** top right navigation, choose **JSON**. Paste the following snippet. Replace `AWS_REGION` with the AWS Region of your AWS SSM hybrid activation and replace `AWS_ACCOUNT_ID` with your AWS account ID. Replace `TAG_KEY` and `TAG_VALUE` with the AWS SSM resource tag key you used when creating your AWS SSM hybrid activation.

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": "ssm:DescribeInstanceInformation",
               "Resource": "*"
           },
           {
               "Effect": "Allow",
               "Action": "ssm:DeregisterManagedInstance",
               "Resource": "arn:aws:ssm:us-east-1:123456789012:managed-instance/*",
               "Condition": {
                   "StringEquals": {
                       "ssm:resourceTag/TAG_KEY": "TAG_VALUE"
                   }
               }
           }
       ]
   }
   ```

   1. Choose **Next**.

1. On the **Review and Create** page.

   1. Enter a **Policy** name for your policy such as `EKSHybridSSMPolicy` 

   1. Choose **Create Policy**.

1. In the left navigation pane, choose **Roles**.

1. On the **Roles** page, choose **Create role**.

1. On the **Select trusted entity** page, do the following:

   1. In the **Trusted entity** type section, choose **Custom trust policy**. Paste the following into the Custom trust policy editor. Replace `AWS_REGION` with the AWS Region of your AWS SSM hybrid activation and `AWS_ACCOUNT_ID` with your AWS account ID.

      ```
      {
         "Version":"2012-10-17",		 	 	 
         "Statement":[
            {
               "Sid":"",
               "Effect":"Allow",
               "Principal":{
                  "Service":"ssm.amazonaws.com"
               },
               "Action":"sts:AssumeRole",
               "Condition":{
                  "StringEquals":{
                     "aws:SourceAccount":"123456789012"
                  },
                  "ArnEquals":{
                     "aws:SourceArn":"arn:aws:ssm:us-east-1:123456789012:*"
                  }
               }
            }
         ]
      }
      ```

   1. Choose Next.

1. On the **Add permissions** page, attach a custom policy or do the following:

   1. In the **Filter policies** box, enter `EKSDescribeClusterPolicy`, or the name of the policy you created above. Select the check box to the left of your policy name in the search results.

   1. In the **Filter policies** box, enter `EKSHybridSSMPolicy`, or the name of the policy you created above. Select the check box to the left of your policy name in the search results.

   1. In the **Filter policies** box, enter `AmazonEC2ContainerRegistryPullOnly`. Select the check box to the left of `AmazonEC2ContainerRegistryPullOnly` in the search results.

   1. In the **Filter policies** box, enter `AmazonSSMManagedInstanceCore`. Select the check box to the left of `AmazonSSMManagedInstanceCore` in the search results.

   1. Choose **Next**.

1. On the **Name, review, and create** page, do the following:

   1. For **Role name**, enter a unique name for your role, such as `AmazonEKSHybridNodesRole`.

   1. For **Description**, replace the current text with descriptive text such as `Amazon EKS - Hybrid Nodes role`.

   1. Choose **Create role**.

 **Steps for AWS IAM Roles Anywhere** 

To use AWS IAM Roles Anywhere, you must set up your AWS IAM Roles Anywhere trust anchor before creating the Hybrid Nodes IAM Role. See [Setup AWS IAM Roles Anywhere](#hybrid-nodes-iam-roles-anywhere) for instructions.

1. Open the [Amazon IAM console](https://console.aws.amazon.com/iam/home) 

1. In the left navigation pane, choose **Roles**.

1. On the **Roles** page, choose **Create role**.

1. On the **Select trusted entity** page, do the following:

   1. In the **Trusted entity type section**, choose **Custom trust policy**. Paste the following into the Custom trust policy editor. Replace `TRUST_ANCHOR ARN` with the ARN of the trust anchor you created in the [Setup AWS IAM Roles Anywhere](#hybrid-nodes-iam-roles-anywhere) steps. The condition in this trust policy restricts the ability of AWS IAM Roles Anywhere to assume the Hybrid Nodes IAM role to exchange temporary IAM credentials only when the role session name matches the CN in the x509 certificate installed on your hybrid nodes. You can alternatively use other certificate attributes to uniquely identify your node. The certificate attribute that you use in the trust policy must correspond to the nodeName you set in your nodeadm configuration. For more information, see the [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md).

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                      "Service": "rolesanywhere.amazonaws.com"
                  },
                  "Action": [
                      "sts:TagSession",
                      "sts:SetSourceIdentity"
                  ],
                  "Condition": {
                      "StringEquals": {
                          "aws:PrincipalTag/x509Subject/CN": "${aws:PrincipalTag/x509Subject/CN}"
                      },
                      "ArnEquals": {
                          "aws:SourceArn": "arn:aws:rolesanywhere:us-east-1:123456789012:trust-anchor/TA_ID"
                      }
                  }
              },
              {
                  "Effect": "Allow",
                  "Principal": {
                      "Service": "rolesanywhere.amazonaws.com"
                  },
                  "Action": "sts:AssumeRole",
                  "Condition": {
                      "StringEquals": {
                          "sts:RoleSessionName": "${aws:PrincipalTag/x509Subject/CN}",
                          "aws:PrincipalTag/x509Subject/CN": "${aws:PrincipalTag/x509Subject/CN}"
                      },
                      "ArnEquals": {
                          "aws:SourceArn": "arn:aws:rolesanywhere:us-east-1:123456789012:trust-anchor/TA_ID"
                      }
                  }
              }
          ]
      }
      ```

   1. Choose Next.

1. On the **Add permissions** page, attach a custom policy or do the following:

   1. In the **Filter policies** box, enter `EKSDescribeClusterPolicy`, or the name of the policy you created above. Select the check box to the left of your policy name in the search results.

   1. In the **Filter policies** box, enter `AmazonEC2ContainerRegistryPullOnly`. Select the check box to the left of `AmazonEC2ContainerRegistryPullOnly` in the search results.

   1. Choose **Next**.

1. On the **Name, review, and create** page, do the following:

   1. For **Role name**, enter a unique name for your role, such as `AmazonEKSHybridNodesRole`.

   1. For **Description**, replace the current text with descriptive text such as `Amazon EKS - Hybrid Nodes role`.

   1. Choose **Create role**.

# Create an Amazon EKS cluster with hybrid nodes
<a name="hybrid-nodes-cluster-create"></a>

This topic provides an overview of the available options and describes what to consider when you create a hybrid nodes-enabled Amazon EKS cluster. EKS Hybrid Nodes have the same [Kubernetes version support](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html) as Amazon EKS clusters with cloud nodes, including standard and extended support.

If you are not planning to use EKS Hybrid Nodes, see the primary Amazon EKS create cluster documentation at [Create an Amazon EKS cluster](create-cluster.md).

## Prerequisites
<a name="hybrid-nodes-cluster-create-prep"></a>
+ The [Prerequisite setup for hybrid nodes](hybrid-nodes-prereqs.md) completed. Before you create your hybrid nodes-enabled cluster, you must have your on-premises node and optionally pod CIDRs identified, your VPC and subnets created according to the EKS requirements, and hybrid nodes requirements, and your security group with inbound rules for your on-premises and optionally pod CIDRs. For more information on these prerequisites, see [Prepare networking for hybrid nodes](hybrid-nodes-networking.md).
+ The latest version of the AWS Command Line Interface (AWS CLI) installed and configured on your device. To check your current version, use `aws --version`. Package managers such yum, apt-get, or Homebrew for macOS are often several versions behind the latest version of the AWS CLI. To install the latest version, see [Installing or updating to the last version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and [Configuring settings for the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html#cli-configure-quickstart-config) in the AWS Command Line Interface User Guide.
+ An [IAM principal](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles#iam-term-principal) with permissions to create IAM roles and attach policies, and create and describe EKS clusters

## Considerations
<a name="hybrid-nodes-cluster-create-consider"></a>
+ Your cluster must use either `API` or `API_AND_CONFIG_MAP` for the cluster authentication mode.
+ Your cluster must use IPv4 address family.
+ Your cluster must use either Public or Private cluster endpoint connectivity. Your cluster cannot use “Public and Private” cluster endpoint connectivity, because the Amazon EKS Kubernetes API server endpoint will resolve to the public IPs for hybrid nodes running outside of your VPC.
+ OIDC authentication is supported for EKS clusters with hybrid nodes.
+ You can add, change, or remove the hybrid nodes configuration of an existing cluster. For more information, see [Enable hybrid nodes on an existing Amazon EKS cluster or modify configuration](hybrid-nodes-cluster-update.md).

## Step 1: Create cluster IAM role
<a name="hybrid-nodes-cluster-create-iam"></a>

If you already have a cluster IAM role, or you’re going to create your cluster with `eksctl` or AWS CloudFormation, then you can skip this step. By default, `eksctl` and the AWS CloudFormation template create the cluster IAM role for you.

1. Run the following command to create an IAM trust policy JSON file.

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "Service": "eks.amazonaws.com"
         },
         "Action": "sts:AssumeRole"
       }
     ]
   }
   ```

1. Create the Amazon EKS cluster IAM role. If necessary, preface eks-cluster-role-trust-policy.json with the path on your computer that you wrote the file to in the previous step. The command associates the trust policy that you created in the previous step to the role. To create an IAM role, the [IAM principal](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles#iam-term-principal) that is creating the role must be assigned the `iam:CreateRole` action (permission).

   ```
   aws iam create-role \
       --role-name myAmazonEKSClusterRole \
       --assume-role-policy-document file://"eks-cluster-role-trust-policy.json"
   ```

1. You can assign either the Amazon EKS managed policy or create your own custom policy. For the minimum permissions that you must use in your custom policy, see [Amazon EKS node IAM role](create-node-role.md). Attach the Amazon EKS managed policy named `AmazonEKSClusterPolicy` to the role. To attach an IAM policy to an [IAM principal](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles#iam-term-principal), the principal that is attaching the policy must be assigned one of the following IAM actions (permissions): `iam:AttachUserPolicy` or `iam:AttachRolePolicy`.

   ```
   aws iam attach-role-policy \
       --policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy \
       --role-name myAmazonEKSClusterRole
   ```

## Step 2: Create hybrid nodes-enabled cluster
<a name="hybrid-nodes-cluster-create-cluster"></a>

You can create a cluster by using:
+  [eksctl](#hybrid-nodes-cluster-create-eksctl) 
+  [AWS CloudFormation](#hybrid-nodes-cluster-create-cfn) 
+  [AWS CLI](#hybrid-nodes-cluster-create-cli) 
+  [AWS Management Console](#hybrid-nodes-cluster-create-console) 

### Create hybrid nodes-enabled cluster - eksctl
<a name="hybrid-nodes-cluster-create-eksctl"></a>

You need to install the latest version of the `eksctl` command line tool. To install or update `eksctl`, see [Installation](https://eksctl.io/installation) in the `eksctl` documentation.

1. Create `cluster-config.yaml` to define a hybrid nodes-enabled Amazon EKS IPv4 cluster. Make the following replacements in your `cluster-config.yaml`. For a full list of settings, see the [eksctl documentation](https://eksctl.io/getting-started/).

   1. Replace `CLUSTER_NAME` with a name for your cluster. The name can contain only alphanumeric characters (case-sensitive) and hyphens. It must start with an alphanumeric character and can’t be longer than 100 characters. The name must be unique within the AWS Region and AWS account that you’re creating the cluster in.

   1. Replace `AWS_REGION` with the AWS Region that you want to create your cluster in.

   1. Replace `K8S_VERSION` with any [Amazon EKS supported version](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html).

   1. Replace `CREDS_PROVIDER` with `ssm` or `ira` based on the credential provider you configured in the steps for [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md).

   1. Replace `CA_BUNDLE_CERT` if your credential provider is set to `ira`, which uses AWS IAM Roles Anywhere as the credential provider. The CA\$1BUNDLE\$1CERT is the certificate authority (CA) certificate body and depends on your choice of CA. The certificate must be in Privacy Enhanced Mail (PEM) format.

   1. Replace `GATEWAY_ID` with the ID of your virtual private gateway or transit gateway to be attached to your VPC.

   1. Replace `REMOTE_NODE_CIDRS` with the on-premises node CIDR for your hybrid nodes.

   1. Replace `REMOTE_POD_CIDRS` with the on-premises pod CIDR for workloads running on hybrid nodes or remove the line from your configuration if you are not running webhooks on hybrid nodes. You must configure your `REMOTE_POD_CIDRS` if your CNI does not use Network Address Translation (NAT) or masquerading for pod IP addresses when pod traffic leaves your on-premises hosts. You must configure `REMOTE_POD_CIDRS` if you are running webhooks on hybrid nodes, see [Configure webhooks for hybrid nodes](hybrid-nodes-webhooks.md) for more information.

   1. Your on-premises node and pod CIDR blocks must meet the following requirements:

      1. Be within one of the IPv4 RFC-1918 ranges: `10.0.0.0/8`, `172.16.0.0/12`, or `192.168.0.0/16` , or within the CGNAT range defined by RFC 6598: `100.64.0.0/10`.

      1. Not overlap with each other, the `VPC CIDR` for your cluster, or your Kubernetes service IPv4 CIDR

         ```
         apiVersion: eksctl.io/v1alpha5
         kind: ClusterConfig
         
         metadata:
           name: CLUSTER_NAME
           region: AWS_REGION
           version: "K8S_VERSION"
         
         remoteNetworkConfig:
           iam:
             provider: CREDS_PROVIDER # default SSM, can also be set to IRA
             # caBundleCert: CA_BUNDLE_CERT
           vpcGatewayID: GATEWAY_ID
           remoteNodeNetworks:
           - cidrs: ["REMOTE_NODE_CIDRS"]
           remotePodNetworks:
           - cidrs: ["REMOTE_POD_CIDRS"]
         ```

1. Run the following command:

   ```
   eksctl create cluster -f cluster-config.yaml
   ```

   Cluster provisioning takes several minutes. While the cluster is being created, several lines of output appear. The last line of output is similar to the following example line.

   ```
   [✓]  EKS cluster "CLUSTER_NAME" in "REGION" region is ready
   ```

1. Continue with [Step 3: Update kubeconfig](#hybrid-nodes-cluster-create-kubeconfig).

### Create hybrid nodes-enabled cluster - AWS CloudFormation
<a name="hybrid-nodes-cluster-create-cfn"></a>

The CloudFormation stack creates the EKS cluster IAM role and an EKS cluster with the `RemoteNodeNetwork` and `RemotePodNetwork` you specify. Modify the CloudFormation template If you need to customize settings for your EKS cluster that are not exposed in the CloudFormation template.

1. Download the CloudFormation template.

   ```
   curl -OL 'https://raw.githubusercontent.com/aws/eks-hybrid/refs/heads/main/example/hybrid-eks-cfn.yaml'
   ```

1. Create a `cfn-eks-parameters.json` and specify your configuration for each value.

   1.  `CLUSTER_NAME`: name of the EKS cluster to be created

   1.  `CLUSTER_ROLE_NAME`: name of the EKS cluster IAM role to be created. The default in the template is “EKSClusterRole”.

   1.  `SUBNET1_ID`: the ID of the first subnet you created in the prerequisite steps

   1.  `SUBNET2_ID`: the ID of the second subnet you created in the prerequisite steps

   1.  `SG_ID`: the security group ID you created in the prerequisite steps

   1.  `REMOTE_NODE_CIDRS`: the on-premises node CIDR for your hybrid nodes

   1.  `REMOTE_POD_CIDRS`: the on-premises pod CIDR for workloads running on hybrid nodes. You must configure your `REMOTE_POD_CIDRS` if your CNI does not use Network Address Translation (NAT) or masquerading for pod IP addresses when pod traffic leaves your on-premises hosts. You must configure `REMOTE_POD_CIDRS` if you are running webhooks on hybrid nodes, see [Configure webhooks for hybrid nodes](hybrid-nodes-webhooks.md) for more information.

   1. Your on-premises node and pod CIDR blocks must meet the following requirements:

      1. Be within one of the IPv4 RFC-1918 ranges: `10.0.0.0/8`, `172.16.0.0/12`, or `192.168.0.0/16`, or within the CGNAT range defined by RFC 6598: `100.64.0.0/10`.

      1. Not overlap with each other, the `VPC CIDR` for your cluster, or your Kubernetes service IPv4 CIDR.

   1.  `CLUSTER_AUTH`: the cluster authentication mode for your cluster. Valid values are `API` and `API_AND_CONFIG_MAP`. The default in the template is `API_AND_CONFIG_MAP`.

   1.  `CLUSTER_ENDPOINT`: the cluster endpoint connectivity for your cluster. Valid values are “Public” and “Private”. The default in the template is Private, which means you will only be able to connect to the Kubernetes API endpoint from within your VPC.

   1.  `K8S_VERSION`: the Kubernetes version to use for your cluster. See [Amazon EKS supported versions](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html).

      ```
      {
        "Parameters": {
          "ClusterName": "CLUSTER_NAME",
          "ClusterRoleName": "CLUSTER_ROLE_NAME",
          "SubnetId1": "SUBNET1_ID",
          "SubnetId2": "SUBNET2_ID",
          "SecurityGroupId" "SG_ID",
          "RemoteNodeCIDR": "REMOTE_NODE_CIDRS",
          "RemotePodCIDR": "REMOTE_POD_CIDRS",
          "ClusterAuthMode": "CLUSTER_AUTH",
          "ClusterEndpointConnectivity": "CLUSTER_ENDPOINT",
          "K8sVersion": "K8S_VERSION"
        }
       }
      ```

1. Deploy the CloudFormation stack. Replace `STACK_NAME` with your name for the CloudFormation stack and `AWS_REGION` with your AWS Region where the cluster will be created.

   ```
   aws cloudformation deploy \
       --stack-name STACK_NAME \
       --region AWS_REGION \
       --template-file hybrid-eks-cfn.yaml \
       --parameter-overrides file://cfn-eks-parameters.json \
       --capabilities CAPABILITY_NAMED_IAM
   ```

   Cluster provisioning takes several minutes. You can check the status of your stack with the following command. Replace `STACK_NAME` with your name for the CloudFormation stack and `AWS_REGION` with your AWS Region where the cluster will be created.

   ```
   aws cloudformation describe-stacks \
       --stack-name STACK_NAME \
       --region AWS_REGION \
       --query 'Stacks[].StackStatus'
   ```

1. Continue with [Step 3: Update kubeconfig](#hybrid-nodes-cluster-create-kubeconfig).

### Create hybrid nodes-enabled cluster - AWS CLI
<a name="hybrid-nodes-cluster-create-cli"></a>

1. Run the following command to create a hybrid nodes-enabled EKS cluster. Before running the command, replace the following with your settings. For a full list of settings, see the [Create an Amazon EKS cluster](create-cluster.md) documentation.

   1.  `CLUSTER_NAME`: name of the EKS cluster to be created

   1.  `AWS_REGION`: AWS Region where the cluster will be created.

   1.  `K8S_VERSION`: the Kubernetes version to use for your cluster. See Amazon EKS supported versions.

   1.  `ROLE_ARN`: the Amazon EKS cluster role you configured for your cluster. See Amazon EKS cluster IAM role for more information.

   1.  `SUBNET1_ID`: the ID of the first subnet you created in the prerequisite steps

   1.  `SUBNET2_ID`: the ID of the second subnet you created in the prerequisite steps

   1.  `SG_ID`: the security group ID you created in the prerequisite steps

   1. You can use `API` and `API_AND_CONFIG_MAP` for your cluster access authentication mode. In the command below, the cluster access authentication mode is set to `API_AND_CONFIG_MAP`.

   1. You can use the `endpointPublicAccess` and `endpointPrivateAccess` parameters to enable or disable public and private access to your cluster’s Kubernetes API server endpoint. In the command below `endpointPublicAccess` is set to false and `endpointPrivateAccess` is set to true.

   1.  `REMOTE_NODE_CIDRS`: the on-premises node CIDR for your hybrid nodes.

   1.  `REMOTE_POD_CIDRS` (optional): the on-premises pod CIDR for workloads running on hybrid nodes.

   1. Your on-premises node and pod CIDR blocks must meet the following requirements:

      1. Be within one of the IPv4 RFC-1918 ranges: `10.0.0.0/8`, `172.16.0.0/12`, or `192.168.0.0/16`, or within the CGNAT range defined by RFC 6598: `100.64.0.0/10`.

      1. Not overlap with each other, the `VPC CIDR` for your Amazon EKS cluster, or your Kubernetes service IPv4 CIDR.

         ```
         aws eks create-cluster \
             --name CLUSTER_NAME \
             --region AWS_REGION \
             --kubernetes-version K8S_VERSION \
             --role-arn ROLE_ARN \
             --resources-vpc-config subnetIds=SUBNET1_ID,SUBNET2_ID,securityGroupIds=SG_ID,endpointPrivateAccess=true,endpointPublicAccess=false \
             --access-config authenticationMode=API_AND_CONFIG_MAP \
             --remote-network-config '{"remoteNodeNetworks":[{"cidrs":["REMOTE_NODE_CIDRS"]}],"remotePodNetworks":[{"cidrs":["REMOTE_POD_CIDRS"]}]}'
         ```

1. It takes several minutes to provision the cluster. You can query the status of your cluster with the following command. Replace `CLUSTER_NAME` with the name of the cluster you are creating and `AWS_REGION` with the AWS Region where the cluster is creating. Don’t proceed to the next step until the output returned is `ACTIVE`.

   ```
   aws eks describe-cluster \
       --name CLUSTER_NAME \
       --region AWS_REGION \
       --query "cluster.status"
   ```

1. Continue with [Step 3: Update kubeconfig](#hybrid-nodes-cluster-create-kubeconfig).

### Create hybrid nodes-enabled cluster - AWS Management Console
<a name="hybrid-nodes-cluster-create-console"></a>

1. Open the Amazon EKS console at [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. Choose Add cluster and then choose Create.

1. On the Configure cluster page, enter the following fields:

   1.  **Name** – A name for your cluster. The name can contain only alphanumeric characters (case-sensitive), hyphens, and underscores. It must start with an alphanumeric character and can’t be longer than 100 characters. The name must be unique within the AWS Region and AWS account that you’re creating the cluster in.

   1.  **Cluster IAM role** – Choose the Amazon EKS cluster IAM role that you created to allow the Kubernetes control plane to manage AWS resources on your behalf.

   1.  **Kubernetes version** – The version of Kubernetes to use for your cluster. We recommend selecting the latest version, unless you need an earlier version.

   1.  **Upgrade policy** - Choose either Extended or Standard.

      1.  **Extended:** This option supports the Kubernetes version for 26 months after the release date. The extended support period has an additional hourly cost that begins after the standard support period ends. When extended support ends, your cluster will be auto upgraded to the next version.

      1.  **Standard:** This option supports the Kubernetes version for 14 months after the release date. There is no additional cost. When standard support ends, your cluster will be auto upgraded to the next version.

   1.  **Cluster access** - choose to allow or disallow cluster administrator access and select an authentication mode. The following authentication modes are supported for hybrid nodes-enabled clusters.

      1.  **EKS API**: The cluster will source authenticated IAM principals only from EKS access entry APIs.

      1.  **EKS API and ConfigMap**: The cluster will source authenticated IAM principals from both EKS access entry APIs and the `aws-auth` ConfigMap.

   1.  **Secrets encryption** – (Optional) Choose to enable secrets encryption of Kubernetes secrets using a KMS key. You can also enable this after you create your cluster. Before you enable this capability, make sure that you’re familiar with the information in [Encrypt Kubernetes secrets with KMS on existing clusters](enable-kms.md).

   1.  **ARC zonal shift** - If enabled, EKS will register your cluster with ARC zonal shift to enable you to use zonal shift to shift application traffic away from an AZ.

   1.  **Tags** – (Optional) Add any tags to your cluster. For more information, see [Organize Amazon EKS resources with tags](eks-using-tags.md).

   1. When you’re done with this page, choose **Next**.

1. On the **Specify networking** page, select values for the following fields:

   1.  **VPC** – Choose an existing VPC that meets [View Amazon EKS networking requirements for VPC and subnets](network-reqs.md) and [Amazon EKS Hybrid Nodes requirements](hybrid-nodes-prereqs.md). Before choosing a VPC, we recommend that you’re familiar with all of the requirements and considerations in View Amazon EKS networking requirements for VPC, subnets, and hybrid nodes. You can’t change which VPC you want to use after cluster creation. If no VPCs are listed, then you need to create one first. For more information, see [Create an Amazon VPC for your Amazon EKS cluster](creating-a-vpc.md) and the [Amazon EKS Hybrid Nodes networking requirements](hybrid-nodes-prereqs.md).

   1.  **Subnets** – By default, all available subnets in the VPC specified in the previous field are preselected. You must select at least two.

   1.  **Security groups** – (Optional) Specify one or more security groups that you want Amazon EKS to associate to the network interfaces that it creates. At least one of the security groups you specify must have inbound rules for your on-premises node and optionally pod CIDRs. See the [Amazon EKS Hybrid Nodes networking requirements](hybrid-nodes-networking.md) for more information. Whether you choose any security groups or not, Amazon EKS creates a security group that enables communication between your cluster and your VPC. Amazon EKS associates this security group, and any that you choose, to the network interfaces that it creates. For more information about the cluster security group that Amazon EKS creates, see [View Amazon EKS security group requirements for clusters](sec-group-reqs.md) You can modify the rules in the cluster security group that Amazon EKS creates.

   1.  **Choose cluster IP address family** – You must choose IPv4 for hybrid nodes-enabled clusters.

   1. (Optional) Choose **Configure Kubernetes Service IP address range** and specify a **Service IPv4 range**.

   1.  **Choose Configure remote networks to enable hybrid nodes** and specify your on-premises node and pod CIDRs for hybrid nodes.

   1. You must configure your remote pod CIDR if your CNI does not use Network Address Translation (NAT) or masquerading for pod IP addresses when pod traffic leaves your on-premises hosts. You must configure the remote pod CIDR if you are running webhooks on hybrid nodes.

   1. Your on-premises node and pod CIDR blocks must meet the following requirements:

      1. Be within one of the IPv4 RFC-1918 ranges: `10.0.0.0/8`, `172.16.0.0/12`, or `192.168.0.0/16`, or within the CGNAT range defined by RFC 6598: `100.64.0.0/10`.

      1. Not overlap with each other, the `VPC CIDR` for your cluster, or your Kubernetes service IPv4 CIDR

   1. For **Cluster endpoint access**, select an option. After your cluster is created, you can change this option. For hybrid nodes-enabled clusters, you must choose either Public or Private. Before selecting a non-default option, make sure to familiarize yourself with the options and their implications. For more information, see [Cluster API server endpoint](cluster-endpoint.md).

   1. When you’re done with this page, choose **Next**.

1. (Optional) On the **Configure** observability page, choose which Metrics and Control plane logging options to turn on. By default, each log type is turned off.

   1. For more information about the Prometheus metrics option, see [Monitor your cluster metrics with Prometheus](prometheus.md).

   1. For more information about the EKS control logging options, see [Send control plane logs to CloudWatch Logs](control-plane-logs.md).

   1. When you’re done with this page, choose **Next**.

1. On the **Select add-ons** page, choose the add-ons that you want to add to your cluster.

   1. You can choose as many **Amazon EKS add-ons** and ** AWS Marketplace add-ons** as you require. Amazon EKS add-ons that are not compatible with hybrid nodes are marked with “Not compatible with Hybrid Nodes” and the add-ons have an anti-affinity rule to prevent them from running on hybrid nodes. See Configuring add-ons for hybrid nodes for more information. If the ** AWS Marketplace** add-ons that you want to install isn’t listed, you can search for available ** AWS Marketplace add-ons** by entering text in the search box. You can also search by **category**, **vendor**, or **pricing model** and then choose the add-ons from the search results.

   1. Some add-ons, such as CoreDNS and kube-proxy, are installed by default. If you disable any of the default add-ons, this may affect your ability to run Kubernetes applications.

   1. When you’re done with this page, choose `Next`.

1. On the **Configure selected add-ons settings** page, select the version that you want to install.

   1. You can always update to a later version after cluster creation. You can update the configuration of each add-on after cluster creation. For more information about configuring add-ons, see [Update an Amazon EKS add-on](updating-an-add-on.md). For the add-ons versions that are compatible with hybrid nodes, see [Configure add-ons for hybrid nodes](hybrid-nodes-add-ons.md).

   1. When you’re done with this page, choose Next.

1. On the **Review and create** page, review the information that you entered or selected on the previous pages. If you need to make changes, choose **Edit**. When you’re satisfied, choose **Create**. The **Status** field shows **CREATING** while the cluster is provisioned. Cluster provisioning takes several minutes.

1. Continue with [Step 3: Update kubeconfig](#hybrid-nodes-cluster-create-kubeconfig).

## Step 3: Update kubeconfig
<a name="hybrid-nodes-cluster-create-kubeconfig"></a>

If you created your cluster using `eksctl`, then you can skip this step. This is because `eksctl` already completed this step for you. Enable `kubectl` to communicate with your cluster by adding a new context to the `kubectl` config file. For more information about how to create and update the file, see [Connect kubectl to an EKS cluster by creating a kubeconfig file](create-kubeconfig.md).

```
aws eks update-kubeconfig --name CLUSTER_NAME --region AWS_REGION
```

An example output is as follows.

```
Added new context arn:aws:eks:AWS_REGION:111122223333:cluster/CLUSTER_NAME to /home/username/.kube/config
```

Confirm communication with your cluster by running the following command.

```
kubectl get svc
```

An example output is as follows.

```
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.100.0.1   <none>        443/TCP   28h
```

## Step 4: Cluster setup
<a name="_step_4_cluster_setup"></a>

As a next step, see [Prepare cluster access for hybrid nodes](hybrid-nodes-cluster-prep.md) to enable access for your hybrid nodes to join your cluster.

# Enable hybrid nodes on an existing Amazon EKS cluster or modify configuration
<a name="hybrid-nodes-cluster-update"></a>

This topic provides an overview of the available options and describes what to consider when you add, change, or remove the hybrid nodes configuration for an Amazon EKS cluster.

To enable an Amazon EKS cluster to use hybrid nodes, add the IP address CIDR ranges of your on-premises node and optionally pod network in the `RemoteNetworkConfig` configuration. EKS uses this list of CIDRs to enable connectivity between the cluster and your on-premises networks. For a full list of options when updating your cluster configuration, see the [UpdateClusterConfig](https://docs.aws.amazon.com/eks/latest/APIReference/API_UpdateClusterConfig.html) in the *Amazon EKS API Reference*.

You can do any of the following actions to the EKS Hybrid Nodes networking configuration in a cluster:
+  [Add remote network configuration to enable EKS Hybrid Nodes in an existing cluster.](#hybrid-nodes-cluster-enable-existing) 
+  [Add, change, or remove the remote node networks or the remote pod networks in an existing cluster.](#hybrid-nodes-cluster-update-config) 
+  [Remove all remote node network CIDR ranges to disable EKS Hybrid Nodes in an existing cluster.](#hybrid-nodes-cluster-disable) 

## Prerequisites
<a name="hybrid-nodes-cluster-enable-prep"></a>
+ Before enabling your Amazon EKS cluster for hybrid nodes, ensure your environment meets the requirements outlined at [Prerequisite setup for hybrid nodes](hybrid-nodes-prereqs.md), and detailed at [Prepare networking for hybrid nodes](hybrid-nodes-networking.md), [Prepare operating system for hybrid nodes](hybrid-nodes-os.md), and [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md).
+ Your cluster must use IPv4 address family.
+ Your cluster must use either `API` or `API_AND_CONFIG_MAP` for the cluster authentication mode. The process for modifying the cluster authentication mode is described at [Change authentication mode to use access entries](setting-up-access-entries.md).
+ We recommend that you use either public or private endpoint access for the Amazon EKS Kubernetes API server endpoint, but not both. If you choose “Public and Private”, the Amazon EKS Kubernetes API server endpoint will always resolve to the public IPs for hybrid nodes running outside of your VPC, which can prevent your hybrid nodes from joining the cluster. The process for modifying network access to your cluster is described at [Cluster API server endpoint](cluster-endpoint.md).
+ The latest version of the AWS Command Line Interface (AWS CLI) installed and configured on your device. To check your current version, use `aws --version`. Package managers such yum, apt-get, or Homebrew for macOS are often several versions behind the latest version of the AWS CLI. To install the latest version, see [Installing or updating to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and [Configuring settings for the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html#cli-configure-quickstart-config) in the AWS Command Line Interface User Guide.
+ An [IAM principal](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles#iam-term-principal) with permission to call [UpdateClusterConfig](https://docs.aws.amazon.com/eks/latest/APIReference/API_UpdateClusterConfig.html) on your Amazon EKS cluster.
+ Update add-ons to versions that are compatible with hybrid nodes. For the add-ons versions that are compatible with hybrid nodes, see [Configure add-ons for hybrid nodes](hybrid-nodes-add-ons.md).
+ If you are running add-ons that are not compatible with hybrid nodes, ensure that the add-on [DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/) or [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) has the following affinity rule to prevent deployment to hybrid nodes. Add the following affinity rule if it is not already present.

  ```
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: eks.amazonaws.com/compute-type
            operator: NotIn
            values:
            - hybrid
  ```

## Considerations
<a name="hybrid-nodes-cluster-enable-consider"></a>

The `remoteNetworkConfig` JSON object has the following behavior during an update:
+ Any existing part of the configuration that you don’t specify is unchanged. If you don’t specify either of the `remoteNodeNetworks` or `remotePodNetworks`, that part will remain the same.
+ If you are modifying either the `remoteNodeNetworks` or `remotePodNetworks` lists of CIDRs, you must specify the complete list of CIDRs that you want in your final configuration. When you specify a change to either the `remoteNodeNetworks` or `remotePodNetworks` CIDR list, EKS replaces the original list during the update.
+ Your on-premises node and pod CIDR blocks must meet the following requirements:

  1. Be within one of the IPv4 RFC-1918 ranges: 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16 , or within the CGNAT range defined by RFC 6598: `100.64.0.0/10` 

  1. Not overlap with each other, all CIDRs of the VPC for your Amazon EKS cluster, or your Kubernetes service IPv4 CIDR.

## Enable hybrid nodes on an existing cluster
<a name="hybrid-nodes-cluster-enable-existing"></a>

You can enable EKS Hybrid Nodes in an existing cluster by using:
+  [AWS CloudFormation](#hybrid-nodes-cluster-enable-cfn) 
+  [AWS CLI](#hybrid-nodes-cluster-enable-cli) 
+  [AWS Management Console](#hybrid-nodes-cluster-enable-console) 

### Enable EKS Hybrid Nodes in an existing cluster - AWS CloudFormation
<a name="hybrid-nodes-cluster-enable-cfn"></a>

1. To enable EKS Hybrid Nodes in your cluster, add the `RemoteNodeNetwork` and (optional) `RemotePodNetwork` to your CloudFormation template and update the stack. Note that `RemoteNodeNetwork` is a list with a maximum of one `Cidrs` item and the `Cidrs` is a list of multiple IP CIDR ranges.

   ```
   RemoteNetworkConfig:
     RemoteNodeNetworks:
       - Cidrs: [RemoteNodeCIDR]
     RemotePodNetworks:
       - Cidrs: [RemotePodCIDR]
   ```

1. Continue to [Prepare cluster access for hybrid nodes](hybrid-nodes-cluster-prep.md).

### Enable EKS Hybrid Nodes in an existing cluster - AWS CLI
<a name="hybrid-nodes-cluster-enable-cli"></a>

1. Run the following command to enable `RemoteNetworkConfig` for EKS Hybrid Nodes for your EKS cluster. Before running the command, replace the following with your settings. For a full list of settings, see the [UpdateClusterConfig](https://docs.aws.amazon.com/eks/latest/APIReference/API_UpdateClusterConfig.html) in the *Amazon EKS API Reference*.

   1.  `CLUSTER_NAME`: name of the EKS cluster to update.

   1.  `AWS_REGION`: AWS Region where the EKS cluster is running.

   1.  `REMOTE_NODE_CIDRS`: the on-premises node CIDR for your hybrid nodes.

   1.  `REMOTE_POD_CIDRS` (optional): the on-premises pod CIDR for workloads running on hybrid nodes.

      ```
      aws eks update-cluster-config \
          --name CLUSTER_NAME \
          --region AWS_REGION \
          --remote-network-config '{"remoteNodeNetworks":[{"cidrs":["REMOTE_NODE_CIDRS"]}],"remotePodNetworks":[{"cidrs":["REMOTE_POD_CIDRS"]}]}'
      ```

1. It takes several minutes to update the cluster. You can query the status of your cluster with the following command. Replace `CLUSTER_NAME` with the name of the cluster you are modifying and `AWS_REGION` with the AWS Region where the cluster is running. Don’t proceed to the next step until the output returned is `ACTIVE`.

   ```
   aws eks describe-cluster \
       --name CLUSTER_NAME \
       --region AWS_REGION \
       --query "cluster.status"
   ```

1. Continue to [Prepare cluster access for hybrid nodes](hybrid-nodes-cluster-prep.md).

### Enable EKS Hybrid Nodes in an existing cluster - AWS Management Console
<a name="hybrid-nodes-cluster-enable-console"></a>

1. Open the Amazon EKS console at [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. Choose the name of the cluster to display your cluster information.

1. Choose the **Networking** tab and choose **Manage**.

1. In the dropdown, choose **Remote networks**.

1.  **Choose Configure remote networks to enable hybrid nodes** and specify your on-premises node and pod CIDRs for hybrid nodes.

1. Choose **Save changes** to finish. Wait for the cluster status to return to **Active**.

1. Continue to [Prepare cluster access for hybrid nodes](hybrid-nodes-cluster-prep.md).

## Update hybrid nodes configuration in an existing cluster
<a name="hybrid-nodes-cluster-update-config"></a>

You can modify `remoteNetworkConfig` in an existing hybrid cluster by using any of the following:
+  [AWS CloudFormation](#hybrid-nodes-cluster-update-cfn) 
+  [AWS CLI](#hybrid-nodes-cluster-update-cli) 
+  [AWS Management Console](#hybrid-nodes-cluster-update-console) 

### Update hybrid configuration in an existing cluster - AWS CloudFormation
<a name="hybrid-nodes-cluster-update-cfn"></a>

1. Update your CloudFormation template with the new network CIDR values.

   ```
   RemoteNetworkConfig:
     RemoteNodeNetworks:
       - Cidrs: [NEW_REMOTE_NODE_CIDRS]
     RemotePodNetworks:
       - Cidrs: [NEW_REMOTE_POD_CIDRS]
   ```
**Note**  
When updating `RemoteNodeNetworks` or `RemotePodNetworks` CIDR lists, include all CIDRs (new and existing). EKS replaces the entire list during updates. Omitting these fields from the update request retains their existing configurations.

1. Update your CloudFormation stack with the modified template and wait for the stack update to complete.

### Update hybrid configuration in an existing cluster - AWS CLI
<a name="hybrid-nodes-cluster-update-cli"></a>

1. To modify the remote network CIDRs, run the following command. Replace the values with your settings:

   ```
   aws eks update-cluster-config
   --name CLUSTER_NAME
   --region AWS_REGION
   --remote-network-config '{"remoteNodeNetworks":[{"cidrs":["NEW_REMOTE_NODE_CIDRS"]}],"remotePodNetworks":[{"cidrs":["NEW_REMOTE_POD_CIDRS"]}]}'
   ```
**Note**  
When updating `remoteNodeNetworks` or `remotePodNetworks` CIDR lists, include all CIDRs (new and existing). EKS replaces the entire list during updates. Omitting these fields from the update request retains their existing configurations.

1. Wait for the cluster status to return to ACTIVE before proceeding.

### Update hybrid configuration in an existing cluster - AWS Management Console
<a name="hybrid-nodes-cluster-update-console"></a>

1. Open the Amazon EKS console at [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. Choose the name of the cluster to display your cluster information.

1. Choose the **Networking** tab and choose **Manage**.

1. In the dropdown, choose **Remote networks**.

1. Update the CIDRs under `Remote node networks` and `Remote pod networks - Optional` as needed.

1. Choose **Save changes** and wait for the cluster status to return to **Active**.

## Disable Hybrid nodes in an existing cluster
<a name="hybrid-nodes-cluster-disable"></a>

You can disable EKS Hybrid Nodes in an existing cluster by using:
+  [AWS CloudFormation](#hybrid-nodes-cluster-disable-cfn) 
+  [AWS CLI](#hybrid-nodes-cluster-disable-cli) 
+  [AWS Management Console](#hybrid-nodes-cluster-disable-console) 

### Disable EKS Hybrid Nodes in an existing cluster - AWS CloudFormation
<a name="hybrid-nodes-cluster-disable-cfn"></a>

1. To disable EKS Hybrid Nodes in your cluster, set `RemoteNodeNetworks` and `RemotePodNetworks` to empty arrays in your CloudFormation template and update the stack.

   ```
   RemoteNetworkConfig:
     RemoteNodeNetworks: []
     RemotePodNetworks: []
   ```

### Disable EKS Hybrid Nodes in an existing cluster - AWS CLI
<a name="hybrid-nodes-cluster-disable-cli"></a>

1. Run the following command to remove `RemoteNetworkConfig` from your EKS cluster. Before running the command, replace the following with your settings. For a full list of settings, see the [UpdateClusterConfig](https://docs.aws.amazon.com/eks/latest/APIReference/API_UpdateClusterConfig.html) in the *Amazon EKS API Reference*.

   1.  `CLUSTER_NAME`: name of the EKS cluster to update.

   1.  `AWS_REGION`: AWS Region where the EKS cluster is running.

      ```
      aws eks update-cluster-config \
          --name CLUSTER_NAME \
          --region AWS_REGION \
          --remote-network-config '{"remoteNodeNetworks":[],"remotePodNetworks":[]}'
      ```

1. It takes several minutes to update the cluster. You can query the status of your cluster with the following command. Replace `CLUSTER_NAME` with the name of the cluster you are modifying and `AWS_REGION` with the AWS Region where the cluster is running. Don’t proceed to the next step until the output returned is `ACTIVE`.

   ```
   aws eks describe-cluster \
       --name CLUSTER_NAME \
       --region AWS_REGION \
       --query "cluster.status"
   ```

### Disable EKS Hybrid Nodes in an existing cluster - AWS Management Console
<a name="hybrid-nodes-cluster-disable-console"></a>

1. Open the Amazon EKS console at [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. Choose the name of the cluster to display your cluster information.

1. Choose the **Networking** tab and choose **Manage**.

1. In the dropdown, choose **Remote networks**.

1. Choose **Configure remote networks to enable hybrid nodes** and remove all the CIDRs under `Remote node networks` and `Remote pod networks - Optional`.

1. Choose **Save changes** to finish. Wait for the cluster status to return to **Active**.

# Prepare cluster access for hybrid nodes
<a name="hybrid-nodes-cluster-prep"></a>

Before connecting hybrid nodes to your Amazon EKS cluster, you must enable your Hybrid Nodes IAM Role with Kubernetes permissions to join the cluster. See [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md) for information on how to create the Hybrid Nodes IAM role. Amazon EKS supports two ways to associate IAM principals with Kubernetes Role-Based Access Control (RBAC), Amazon EKS access entries and the `aws-auth` ConfigMap. For more information on Amazon EKS access management, see [Grant IAM users and roles access to Kubernetes APIs](grant-k8s-access.md).

Use the procedures below to associate your Hybrid Nodes IAM role with Kubernetes permissions. To use Amazon EKS access entries, your cluster must have been created with the `API` or `API_AND_CONFIG_MAP` authentication modes. To use the `aws-auth` ConfigMap, your cluster must have been created with the `API_AND_CONFIG_MAP` authentication mode. The `CONFIG_MAP`-only authentication mode is not supported for hybrid nodes-enabled Amazon EKS clusters.

## Using Amazon EKS access entries for Hybrid Nodes IAM role
<a name="_using_amazon_eks_access_entries_for_hybrid_nodes_iam_role"></a>

There is an Amazon EKS access entry type for hybrid nodes named HYBRID\$1LINUX that can be used with an IAM role. With this access entry type, the username is automatically set to system:node:\$1\$1SessionName\$1\$1. For more information on creating access entries, see [Create access entries](creating-access-entries.md).

### AWS CLI
<a name="shared_aws_cli"></a>

1. You must have the latest version of the AWS CLI installed and configured on your device. To check your current version, use `aws --version`. Package managers such yum, apt-get, or Homebrew for macOS are often several versions behind the latest version of the AWS CLI. To install the latest version, see Installing and Quick configuration with aws configure in the AWS Command Line Interface User Guide.

1. Create your access entry with the following command. Replace CLUSTER\$1NAME with the name of your cluster and HYBRID\$1NODES\$1ROLE\$1ARN with the ARN of the role you created in the steps for [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md).

   ```
   aws eks create-access-entry --cluster-name CLUSTER_NAME \
       --principal-arn HYBRID_NODES_ROLE_ARN \
       --type HYBRID_LINUX
   ```

### AWS Management Console
<a name="hybrid-nodes-cluster-prep-console"></a>

1. Open the Amazon EKS console at [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. Choose the name of your hybrid nodes-enabled cluster.

1. Choose the **Access** tab.

1. Choose **Create access entry**.

1. For **IAM principal**, select the Hybrid Nodes IAM role you created in the steps for [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md).

1. For **Type**, select **Hybrid Linux**.

1. (Optional) For **Tags**, assign labels to the access entry. For example, to make it easier to find all resources with the same tag.

1. Choose **Skip to review and create**. You cannot add policies to the Hybrid Linux access entry or change its access scope.

1. Review the configuration for your access entry. If anything looks incorrect, choose **Previous** to go back through the steps and correct the error. If the configuration is correct, choose **Create**.

## Using aws-auth ConfigMap for Hybrid Nodes IAM role
<a name="_using_aws_auth_configmap_for_hybrid_nodes_iam_role"></a>

In the following steps, you will create or update the `aws-auth` ConfigMap with the ARN of the Hybrid Nodes IAM Role you created in the steps for [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md).

1. Check to see if you have an existing `aws-auth` ConfigMap for your cluster. Note that if you are using a specific `kubeconfig` file, use the `--kubeconfig` flag.

   ```
   kubectl describe configmap -n kube-system aws-auth
   ```

1. If you are shown an `aws-auth` ConfigMap, then update it as needed.

   1. Open the ConfigMap for editing.

      ```
      kubectl edit -n kube-system configmap/aws-auth
      ```

   1. Add a new `mapRoles` entry as needed. Replace `HYBRID_NODES_ROLE_ARN` with the ARN of your Hybrid Nodes IAM role. Note, `{{SessionName}}` is the correct template format to save in the ConfigMap. Do not replace it with other values.

      ```
      data:
        mapRoles: |
        - groups:
          - system:bootstrappers
          - system:nodes
          rolearn: HYBRID_NODES_ROLE_ARN
          username: system:node:{{SessionName}}
      ```

   1. Save the file and exit your text editor.

1. If there is not an existing `aws-auth` ConfigMap for your cluster, create it with the following command. Replace `HYBRID_NODES_ROLE_ARN` with the ARN of your Hybrid Nodes IAM role. Note that `{{SessionName}}` is the correct template format to save in the ConfigMap. Do not replace it with other values.

   ```
   kubectl apply -f=/dev/stdin <<-EOF
   apiVersion: v1
   kind: ConfigMap
   metadata:
     name: aws-auth
     namespace: kube-system
   data:
     mapRoles: |
     - groups:
       - system:bootstrappers
       - system:nodes
       rolearn: HYBRID_NODES_ROLE_ARN
       username: system:node:{{SessionName}}
   EOF
   ```

# Run on-premises workloads on hybrid nodes
<a name="hybrid-nodes-tutorial"></a>

In an EKS cluster with hybrid nodes enabled, you can run on-premises and edge applications on your own infrastructure with the same Amazon EKS clusters, features, and tools that you use in AWS Cloud.

The following sections contain step-by-step instructions for using hybrid nodes.

**Topics**
+ [Connect hybrid nodes](hybrid-nodes-join.md)
+ [Connect hybrid nodes with Bottlerocket](hybrid-nodes-bottlerocket.md)
+ [Upgrade hybrid nodes](hybrid-nodes-upgrade.md)
+ [Patch hybrid nodes](hybrid-nodes-security.md)
+ [Delete hybrid nodes](hybrid-nodes-remove.md)

# Connect hybrid nodes
<a name="hybrid-nodes-join"></a>

**Note**  
The following steps apply to hybrid nodes running compatible operating systems except Bottlerocket. For steps to connect a hybrid node that runs Bottlerocket, see [Connect hybrid nodes with Bottlerocket](hybrid-nodes-bottlerocket.md).

This topic describes how to connect hybrid nodes to an Amazon EKS cluster. After your hybrid nodes join the cluster, they will appear with status Not Ready in the Amazon EKS console and in Kubernetes-compatible tooling such as kubectl. After completing the steps on this page, proceed to [Configure CNI for hybrid nodes](hybrid-nodes-cni.md) to make your hybrid nodes ready to run applications.

## Prerequisites
<a name="_prerequisites"></a>

Before connecting hybrid nodes to your Amazon EKS cluster, make sure you have completed the prerequisite steps.
+ You have network connectivity from your on-premises environment to the AWS Region hosting your Amazon EKS cluster. See [Prepare networking for hybrid nodes](hybrid-nodes-networking.md) for more information.
+ You have a compatible operating system for hybrid nodes installed on your on-premises hosts. See [Prepare operating system for hybrid nodes](hybrid-nodes-os.md) for more information.
+ You have created your Hybrid Nodes IAM role and set up your on-premises credential provider (AWS Systems Manager hybrid activations or AWS IAM Roles Anywhere). See [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md) for more information.
+ You have created your hybrid nodes-enabled Amazon EKS cluster. See [Create an Amazon EKS cluster with hybrid nodes](hybrid-nodes-cluster-create.md) for more information.
+ You have associated your Hybrid Nodes IAM role with Kubernetes Role-Based Access Control (RBAC) permissions. See [Prepare cluster access for hybrid nodes](hybrid-nodes-cluster-prep.md) for more information.

## Step 1: Install the hybrid nodes CLI (`nodeadm`) on each on-premises host
<a name="_step_1_install_the_hybrid_nodes_cli_nodeadm_on_each_on_premises_host"></a>

If you are including the Amazon EKS Hybrid Nodes CLI (`nodeadm`) in your pre-built operating system images, you can skip this step. For more information on the hybrid nodes version of `nodeadm`, see [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md).

The hybrid nodes version of `nodeadm` is hosted in Amazon S3 fronted by Amazon CloudFront. To install `nodeadm` on each on-premises host, you can run the following command from your on-premises hosts.

 **For x86\$164 hosts:** 

```
curl -OL 'https://hybrid-assets.eks.amazonaws.com/releases/latest/bin/linux/amd64/nodeadm'
```

 **For ARM hosts** 

```
curl -OL 'https://hybrid-assets.eks.amazonaws.com/releases/latest/bin/linux/arm64/nodeadm'
```

Add executable file permission to the downloaded binary on each host.

```
chmod +x nodeadm
```

## Step 2: Install the hybrid nodes dependencies with `nodeadm`
<a name="_step_2_install_the_hybrid_nodes_dependencies_with_nodeadm"></a>

If you are installing the hybrid nodes dependencies in pre-built operating system images, you can skip this step. The `nodeadm install` command can be used to install all dependencies required for hybrid nodes. The hybrid nodes dependencies include containerd, kubelet, kubectl, and AWS SSM or AWS IAM Roles Anywhere components. See [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md) for more information on the components and file locations installed by `nodeadm install`. See [Prepare networking for hybrid nodes](hybrid-nodes-networking.md) for hybrid nodes for more information on the domains that must be allowed in your on-premises firewall for the `nodeadm install` process.

Run the command below to install the hybrid nodes dependencies on your on-premises host. The command below must be run with a user that has sudo/root access on your host.

**Important**  
The hybrid nodes CLI (`nodeadm`) must be run with a user that has sudo/root access on your host.
+ Replace `K8S_VERSION` with the Kubernetes minor version of your Amazon EKS cluster, for example `1.31`. See [Amazon EKS supported versions](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html) for a list of the supported Kubernetes versions.
+ Replace `CREDS_PROVIDER` with the on-premises credential provider you are using. Valid values are `ssm` for AWS SSM and `iam-ra` for AWS IAM Roles Anywhere.

```
nodeadm install K8S_VERSION --credential-provider CREDS_PROVIDER
```

## Step 3: Connect hybrid nodes to your cluster
<a name="_step_3_connect_hybrid_nodes_to_your_cluster"></a>

Before connecting your hybrid nodes to your cluster, make sure you have allowed the required access in your on-premises firewall and in the security group for your cluster for the Amazon EKS control plane to/from hybrid node communication. Most issues at this step are related to the firewall configuration, security group configuration, or Hybrid Nodes IAM role configuration.

**Important**  
The hybrid nodes CLI (`nodeadm`) must be run with a user that has sudo/root access on your host.

1. Create a `nodeConfig.yaml` file on each host with the values for your deployment. For a full description of the available configuration settings, see [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md). If your Hybrid Nodes IAM role does not have permission for the `eks:DescribeCluster` action, you must pass your Kubernetes API endpoint, cluster CA bundle, and Kubernetes service IPv4 CIDR in the cluster section of your `nodeConfig.yaml`.

   1. Use the `nodeConfig.yaml` example below if you are using AWS SSM hybrid activations for your on-premises credentials provider.

      1. Replace `CLUSTER_NAME` with the name of your cluster.

      1. Replace `AWS_REGION` with the AWS Region hosting your cluster. For example, `us-west-2`.

      1. Replace `ACTIVATION_CODE` with the activation code you received when creating your AWS SSM hybrid activation. See [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md) for more information.

      1. Replace `ACTIVATION_ID` with the activation ID you received when creating your AWS SSM hybrid activation. You can retrieve this information from the AWS Systems Manager console or from the AWS CLI `aws ssm describe-activations` command.

         ```
         apiVersion: node.eks.aws/v1alpha1
         kind: NodeConfig
         spec:
           cluster:
             name: CLUSTER_NAME
             region: AWS_REGION
           hybrid:
             ssm:
               activationCode: ACTIVATION_CODE
               activationId: ACTIVATION_ID
         ```

   1. Use the `nodeConfig.yaml` example below if you are using AWS IAM Roles Anywhere for your on-premises credentials provider.

      1. Replace `CLUSTER_NAME` with the name of your cluster.

      1. Replace `AWS_REGION` with the AWS Region hosting your cluster. For example, `us-west-2`.

      1. Replace `NODE_NAME` with the name of your node. The node name must match the CN of the certificate on the host if you configured the trust policy of your Hybrid Nodes IAM role with the `"sts:RoleSessionName": "${aws:PrincipalTag/x509Subject/CN}"` resource condition. The `nodeName` you use must not be longer than 64 characters.

      1. Replace `TRUST_ANCHOR_ARN` with the ARN of the trust anchor you configured in the steps for Prepare credentials for hybrid nodes.

      1. Replace `PROFILE_ARN` with the ARN of the trust anchor you configured in the steps for [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md).

      1. Replace `ROLE_ARN` with the ARN of your Hybrid Nodes IAM role.

      1. Replace `CERTIFICATE_PATH` with the path in disk to your node certificate. If you don’t specify it, the default is `/etc/iam/pki/server.pem`.

      1. Replace `KEY_PATH` with the path in disk to your certificate private key. If you don’t specify it, the default is `/etc/iam/pki/server.key`.

         ```
         apiVersion: node.eks.aws/v1alpha1
         kind: NodeConfig
         spec:
           cluster:
             name: CLUSTER_NAME
             region: AWS_REGION
           hybrid:
             iamRolesAnywhere:
               nodeName: NODE_NAME
               trustAnchorArn: TRUST_ANCHOR_ARN
               profileArn: PROFILE_ARN
               roleArn: ROLE_ARN
               certificatePath: CERTIFICATE_PATH
               privateKeyPath: KEY_PATH
         ```

1. Run the `nodeadm init` command with your `nodeConfig.yaml` to connect your hybrid nodes to your Amazon EKS cluster.

   ```
   nodeadm init -c file://nodeConfig.yaml
   ```

If the above command completes successfully, your hybrid node has joined your Amazon EKS cluster. You can verify this in the Amazon EKS console by navigating to the Compute tab for your cluster ([ensure IAM principal has permissions to view](view-kubernetes-resources.md#view-kubernetes-resources-permissions)) or with `kubectl get nodes`.

**Important**  
Your nodes will have status `Not Ready`, which is expected and is due to the lack of a CNI running on your hybrid nodes. If your nodes did not join the cluster, see [Troubleshooting hybrid nodes](hybrid-nodes-troubleshooting.md).

## Step 4: Configure a CNI for hybrid nodes
<a name="_step_4_configure_a_cni_for_hybrid_nodes"></a>

To make your hybrid nodes ready to run applications, continue with the steps on [Configure CNI for hybrid nodes](hybrid-nodes-cni.md).

# Connect hybrid nodes with Bottlerocket
<a name="hybrid-nodes-bottlerocket"></a>

This topic describes how to connect hybrid nodes running Bottlerocket to an Amazon EKS cluster. [Bottlerocket](https://aws.amazon.com/bottlerocket/) is an open source Linux distribution that is sponsored and supported by AWS. Bottlerocket is purpose-built for hosting container workloads. With Bottlerocket, you can improve the availability of containerized deployments and reduce operational costs by automating updates to your container infrastructure. Bottlerocket includes only the essential software to run containers, which improves resource usage, reduces security threats, and lowers management overhead.

Only VMware variants of Bottlerocket version v1.37.0 and above are supported with EKS Hybrid Nodes. VMware variants of Bottlerocket are available for Kubernetes versions v1.28 and above. The OS images for these variants include the kubelet, containerd, aws-iam-authenticator and other software prerequisites for EKS Hybrid Nodes. You can configure these components using a Bottlerocket [settings](https://github.com/bottlerocket-os/bottlerocket#settings) file that includes base64 encoded user-data for the Bottlerocket bootstrap and admin containers. Configuring these settings enables Bottlerocket to use your hybrid nodes credentials provider to authenticate hybrid nodes to your cluster. After your hybrid nodes join the cluster, they will appear with status `Not Ready` in the Amazon EKS console and in Kubernetes-compatible tooling such as `kubectl`. After completing the steps on this page, proceed to [Configure CNI for hybrid nodes](hybrid-nodes-cni.md) to make your hybrid nodes ready to run applications.

## Prerequisites
<a name="_prerequisites"></a>

Before connecting hybrid nodes to your Amazon EKS cluster, make sure you have completed the prerequisite steps.
+ You have network connectivity from your on-premises environment to the AWS Region hosting your Amazon EKS cluster. See [Prepare networking for hybrid nodes](hybrid-nodes-networking.md) for more information.
+ You have created your Hybrid Nodes IAM role and set up your on-premises credential provider (AWS Systems Manager hybrid activations or AWS IAM Roles Anywhere). See [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md) for more information.
+ You have created your hybrid nodes-enabled Amazon EKS cluster. See [Create an Amazon EKS cluster with hybrid nodes](hybrid-nodes-cluster-create.md) for more information.
+ You have associated your Hybrid Nodes IAM role with Kubernetes Role-Based Access Control (RBAC) permissions. See [Prepare cluster access for hybrid nodes](hybrid-nodes-cluster-prep.md) for more information.

## Step 1: Create the Bottlerocket settings TOML file
<a name="_step_1_create_the_bottlerocket_settings_toml_file"></a>

To configure Bottlerocket for hybrid nodes, you need to create a `settings.toml` file with the necessary configuration. The contents of the TOML file will differ based on the credential provider you are using (SSM or IAM Roles Anywhere). This file will be passed as user data when provisioning the Bottlerocket instance.

**Note**  
The TOML files provided below only represent the minimum required settings for initializing a Bottlerocket VMWare machine as a node on an EKS cluster. Bottlerocket provides a wide range of settings to address several different use cases, so for further configuration options beyond hybrid node initialization, please refer to the [Bottlerocket documentation](https://bottlerocket.dev/en) for the comprehensive list of all documented settings for the Bottlerocket version you are using (for example, [here](https://bottlerocket.dev/en/os/1.51.x/api/settings-index) are all the settings available for Bottlerocket 1.51.x).

### SSM
<a name="_ssm"></a>

If you are using AWS Systems Manager as your credential provider, create a `settings.toml` file with the following content:

```
[settings.kubernetes]
cluster-name = "<cluster-name>"
api-server = "<api-server-endpoint>"
cluster-certificate = "<cluster-certificate-authority>"
hostname-override = "<hostname>"
provider-id = "eks-hybrid:///<region>/<cluster-name>/<hostname>"
authentication-mode = "aws"
cloud-provider = ""
server-tls-bootstrap = true

[settings.network]
hostname = "<hostname>"

[settings.aws]
region = "<region>"

[settings.kubernetes.credential-providers.ecr-credential-provider]
enabled = true
cache-duration = "12h"
image-patterns = [
    "*.dkr.ecr.*.amazonaws.com",
    "*.dkr.ecr.*.amazonaws.com.rproxy.goskope.com.cn",
    "*.dkr.ecr.*.amazonaws.eu",
    "*.dkr.ecr-fips.*.amazonaws.com",
    "*.dkr.ecr-fips.*.amazonaws.eu",
    "public.ecr.aws"
]

[settings.kubernetes.node-labels]
"eks.amazonaws.com/compute-type" = "hybrid"
"eks.amazonaws.com/hybrid-credential-provider" = "ssm"

[settings.host-containers.admin]
enabled = true
user-data = "<base64-encoded-admin-container-userdata>"

[settings.bootstrap-containers.eks-hybrid-setup]
mode = "always"
user-data = "<base64-encoded-bootstrap-container-userdata>"

[settings.host-containers.control]
enabled = true
```

Replace the placeholders with the following values:
+  `<cluster-name>`: The name of your Amazon EKS cluster.
+  `<api-server-endpoint>`: The API server endpoint of your cluster.
+  `<cluster-certificate-authority>`: The base64-encoded CA bundle of your cluster.
+  `<region>`: The AWS Region hosting your cluster, for example "us-east-1".
+  `<hostname>`: The hostname of the Bottlerocket instance, which will also be configured as the node name. This can be any unique value of your choice, but must follow the [Kubernetes Object naming conventions](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names). In addition, the hostname you use cannot be longer than 64 characters. NOTE: When using SSM provider, this hostname and node name will be replaced by the managed instance ID (for example, `mi-*` ID) after the instance has been registered with SSM.
+  `<base64-encoded-admin-container-userdata>`: The base64-encoded contents of the Bottlerocket admin container configuration. Enabling the admin container allows you to connect to your Bottlerocket instance with SSH for system exploration and debugging. While this is not a required setting, we recommend enabling it for ease of troubleshooting. Refer to the [Bottlerocket admin container documentation](https://github.com/bottlerocket-os/bottlerocket-admin-container#authenticating-with-the-admin-container) for more information on authenticating with the admin container. The admin container takes SSH user and key input in JSON format, for example,

```
{
  "user": "<ssh-user>",
  "ssh": {
    "authorized-keys": [
      "<ssh-authorized-key>"
    ]
  }
}
```
+  `<base64-encoded-bootstrap-container-userdata>`: The base64-encoded contents of the Bottlerocket bootstrap container configuration. Refer to the [Bottlerocket bootstrap container documentation](https://github.com/bottlerocket-os/bottlerocket-bootstrap-container) for more information on its configuration. The bootstrap container is responsible for registering the instance as an AWS SSM Managed Instance and joining it as a Kubernetes node on your Amazon EKS Cluster. The user data passed into the bootstrap container takes the form of a command invocation which accepts as input the SSM hybrid activation code and ID you previously created:

```
eks-hybrid-ssm-setup --activation-id=<activation-id> --activation-code=<activation-code> --region=<region>
```

### IAM Roles Anywhere
<a name="_iam_roles_anywhere"></a>

If you are using AWS IAM Roles Anywhere as your credential provider, create a `settings.toml` file with the following content:

```
[settings.kubernetes]
cluster-name = "<cluster-name>"
api-server = "<api-server-endpoint>"
cluster-certificate = "<cluster-certificate-authority>"
hostname-override = "<hostname>"
provider-id = "eks-hybrid:///<region>/<cluster-name>/<hostname>"
authentication-mode = "aws"
cloud-provider = ""
server-tls-bootstrap = true

[settings.network]
hostname = "<hostname>"

[settings.aws]
region = "<region>"
config = "<base64-encoded-aws-config-file>"

[settings.kubernetes.credential-providers.ecr-credential-provider]
enabled = true
cache-duration = "12h"
image-patterns = [
    "*.dkr.ecr.*.amazonaws.com",
    "*.dkr.ecr.*.amazonaws.com.rproxy.goskope.com.cn",
    "*.dkr.ecr.*.amazonaws.eu",
    "*.dkr.ecr-fips.*.amazonaws.com",
    "*.dkr.ecr-fips.*.amazonaws.eu",
    "public.ecr.aws"
]

[settings.kubernetes.node-labels]
"eks.amazonaws.com/compute-type" = "hybrid"
"eks.amazonaws.com/hybrid-credential-provider" = "iam-ra"

[settings.host-containers.admin]
enabled = true
user-data = "<base64-encoded-admin-container-userdata>"

[settings.bootstrap-containers.eks-hybrid-setup]
mode = "always"
user-data = "<base64-encoded-bootstrap-container-userdata>"
```

Replace the placeholders with the following values:
+  `<cluster-name>`: The name of your Amazon EKS cluster.
+  `<api-server-endpoint>`: The API server endpoint of your cluster.
+  `<cluster-certificate-authority>`: The base64-encoded CA bundle of your cluster.
+  `<region>`: The AWS Region hosting your cluster, e.g., "us-east-1"
+  `<hostname>`: The hostname of the Bottlerocket instance, which will also be configured as the node name. This can be any unique value of your choice, but must follow the [Kubernetes Object naming conventions](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names). In addition, the hostname you use cannot be longer than 64 characters. NOTE: When using IAM-RA provider, the node name must match the CN of the certificate on the host if you configured the trust policy of your Hybrid Nodes IAM role with the `"sts:RoleSessionName": "${aws:PrincipalTag/x509Subject/CN}"` resource condition.
+  `<base64-encoded-aws-config-file>`: The base64-encoded contents of your AWS config file. The contents of the file should be as follows:

```
[default]
credential_process = aws_signing_helper credential-process --certificate /root/.aws/node.crt --private-key /root/.aws/node.key --profile-arn <profile-arn> --role-arn <role-arn> --trust-anchor-arn <trust-anchor-arn> --role-session-name <role-session-name>
```
+  `<base64-encoded-admin-container-userdata>`: The base64-encoded contents of the Bottlerocket admin container configuration. Enabling the admin container allows you to connect to your Bottlerocket instance with SSH for system exploration and debugging. While this is not a required setting, we recommend enabling it for ease of troubleshooting. Refer to the [Bottlerocket admin container documentation](https://github.com/bottlerocket-os/bottlerocket-admin-container#authenticating-with-the-admin-container) for more information on authenticating with the admin container. The admin container takes SSH user and key input in JSON format, for example,

```
{
  "user": "<ssh-user>",
  "ssh": {
    "authorized-keys": [
      "<ssh-authorized-key>"
    ]
  }
}
```
+  `<base64-encoded-bootstrap-container-userdata>`: The base64-encoded contents of the Bottlerocket bootstrap container configuration. Refer to the [Bottlerocket bootstrap container documentation](https://github.com/bottlerocket-os/bottlerocket-bootstrap-container) for more information on its configuration. The bootstrap container is responsible for creating the IAM Roles Anywhere host certificate and certificate private key files on the instance. These will then be consumed by the `aws_signing_helper` to obtain temporary credentials for authenticating with your Amazon EKS cluster. The user data passed into the bootstrap container takes the form of a command invocation which accepts as input the contents of the certificate and private key you previously created:

```
eks-hybrid-iam-ra-setup --certificate=<certificate> --key=<private-key>
```

## Step 2: Provision the Bottlerocket vSphere VM with user data
<a name="_step_2_provision_the_bottlerocket_vsphere_vm_with_user_data"></a>

Once you have constructed the TOML file, pass it as user data during vSphere VM creation. Keep in mind that the user data must be configured before the VM is powered on for the first time. As such, you will need to supply it when creating the instance, or if you wish to create the VM ahead of time, the VM must be in poweredOff state until you configure the user data for it. For example, if using the `govc` CLI:

### Creating VM for the first time
<a name="_creating_vm_for_the_first_time"></a>

```
govc vm.create \
  -on=true \
  -c=2 \
  -m=4096 \
  -net.adapter=<network-adapter> \
  -net=<network-name> \
  -e guestinfo.userdata.encoding="base64" \
  -e guestinfo.userdata="$(base64 -w0 settings.toml)" \
  -template=<template-name> \
  <vm-name>
```

### Updating user data for an existing VM
<a name="_updating_user_data_for_an_existing_vm"></a>

```
govc vm.create \
    -on=false \
    -c=2 \
    -m=4096 \
    -net.adapter=<network-adapter> \
    -net=<network-name> \
    -template=<template-name> \
    <vm-name>

govc vm.change
    -vm <vm-name> \
    -e guestinfo.userdata="$(base64 -w0 settings.toml)" \
    -e guestinfo.userdata.encoding="base64"

govc vm.power -on <vm-name>
```

In the above sections, the `-e guestinfo.userdata.encoding="base64"` option specifies that the user data is base64-encoded. The `-e guestinfo.userdata` option passes the base64-encoded contents of the `settings.toml` file as user data to the Bottlerocket instance. Replace the placeholders with your specific values, such as the Bottlerocket OVA template and networking details.

## Step 3: Verify the hybrid node connection
<a name="_step_3_verify_the_hybrid_node_connection"></a>

After the Bottlerocket instance starts, it will attempt to join your Amazon EKS cluster. You can verify the connection in the Amazon EKS console by navigating to the Compute tab for your cluster or by running the following command:

```
kubectl get nodes
```

**Important**  
Your nodes will have status `Not Ready`, which is expected and is due to the lack of a CNI running on your hybrid nodes. If your nodes did not join the cluster, see [Troubleshooting hybrid nodes](hybrid-nodes-troubleshooting.md).

## Step 4: Configure a CNI for hybrid nodes
<a name="_step_4_configure_a_cni_for_hybrid_nodes"></a>

To make your hybrid nodes ready to run applications, continue with the steps on [Configure CNI for hybrid nodes](hybrid-nodes-cni.md).

# Upgrade hybrid nodes for your cluster
<a name="hybrid-nodes-upgrade"></a>

The guidance for upgrading hybrid nodes is similar to self-managed Amazon EKS nodes that run in Amazon EC2. We recommend that you create new hybrid nodes on your target Kubernetes version, gracefully migrate your existing applications to the hybrid nodes on the new Kubernetes version, and remove the hybrid nodes on the old Kubernetes version from your cluster. Be sure to review the [Amazon EKS Best Practices](https://docs.aws.amazon.com/eks/latest/best-practices/cluster-upgrades.html) for upgrades before initiating an upgrade. Amazon EKS Hybrid Nodes have the same [Kubernetes version support](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html) for Amazon EKS clusters with cloud nodes, including standard and extended support.

Amazon EKS Hybrid Nodes follow the same [version skew policy](https://kubernetes.io/releases/version-skew-policy/#supported-version-skew) for nodes as upstream Kubernetes. Amazon EKS Hybrid Nodes cannot be on a newer version than the Amazon EKS control plane, and hybrid nodes may be up to three Kubernetes minor versions older than the Amazon EKS control plane minor version.

If you do not have spare capacity to create new hybrid nodes on your target Kubernetes version for a cutover migration upgrade strategy, you can alternatively use the Amazon EKS Hybrid Nodes CLI (`nodeadm`) to upgrade the Kubernetes version of your hybrid nodes in-place.

**Important**  
If you are upgrading your hybrid nodes in-place with `nodeadm`, there is downtime for the node during the process where the older version of the Kubernetes components are shut down and the new Kubernetes version components are installed and started.

## Prerequisites
<a name="_prerequisites"></a>

Before upgrading, make sure you have completed the following prerequisites.
+ The target Kubernetes version for your hybrid nodes upgrade must be equal to or less than the Amazon EKS control plane version.
+ If you are following a cutover migration upgrade strategy, the new hybrid nodes you are installing on your target Kubernetes version must meet the [Prerequisite setup for hybrid nodes](hybrid-nodes-prereqs.md) requirements. This includes having IP addresses within the Remote Node Network CIDR you passed during Amazon EKS cluster creation.
+ For both cutover migration and in-place upgrades, the hybrid nodes must have access to the [required domains](hybrid-nodes-networking.md#hybrid-nodes-networking-on-prem) to pull the new versions of the hybrid nodes dependencies.
+ You must have kubectl installed on your local machine or instance you are using to interact with your Amazon EKS Kubernetes API endpoint.
+ The version of your CNI must support the Kubernetes version you are upgrading to. If it does not, upgrade your CNI version before upgrading your hybrid nodes. See [Configure CNI for hybrid nodes](hybrid-nodes-cni.md) for more information.

## Cutover migration (blue-green) upgrades
<a name="hybrid-nodes-upgrade-cutover"></a>

 *Cutover migration upgrades* refer to the process of creating new hybrid nodes on new hosts with your target Kubernetes version, gracefully migrating your existing applications to the new hybrid nodes on your target Kubernetes version, and removing the hybrid nodes on the old Kubernetes version from your cluster. This strategy is also called a blue-green migration.

1. Connect your new hosts as hybrid nodes following the [Connect hybrid nodes](hybrid-nodes-join.md) steps. When running the `nodeadm install` command, use your target Kubernetes version.

1. Enable communication between the new hybrid nodes on the target Kubernetes version and your hybrid nodes on the old Kubernetes version. This configuration allows pods to communicate with each other while you are migrating your workload to the hybrid nodes on the target Kubernetes version.

1. Confirm your hybrid nodes on your target Kubernetes version successfully joined your cluster and have status Ready.

1. Use the following command to mark each of the nodes that you want to remove as unschedulable. This is so that new pods aren’t scheduled or rescheduled on the nodes that you are replacing. For more information, see [kubectl cordon](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_cordon/) in the Kubernetes documentation. Replace `NODE_NAME` with the name of the hybrid nodes on the old Kubernetes version.

   ```
   kubectl cordon NODE_NAME
   ```

   You can identify and cordon all of the nodes of a particular Kubernetes version (in this case, `1.28`) with the following code snippet.

   ```
   K8S_VERSION=1.28
   for node in $(kubectl get nodes -o json | jq --arg K8S_VERSION "$K8S_VERSION" -r '.items[] | select(.status.nodeInfo.kubeletVersion | match("\($K8S_VERSION)")).metadata.name')
   do
       echo "Cordoning $node"
       kubectl cordon $node
   done
   ```

1. If your current deployment is running fewer than two CoreDNS replicas on your hybrid nodes, scale out the deployment to at least two replicas. We recommend that you run at least two CoreDNS replicas on hybrid nodes for resiliency during normal operations.

   ```
   kubectl scale deployments/coredns --replicas=2 -n kube-system
   ```

1. Drain each of the hybrid nodes on the old Kubernetes version that you want to remove from your cluster with the following command. For more information on draining nodes, see [Safely Drain a Node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) in the Kubernetes documentation. Replace `NODE_NAME` with the name of the hybrid nodes on the old Kubernetes version.

   ```
   kubectl drain NODE_NAME --ignore-daemonsets --delete-emptydir-data
   ```

   You can identify and drain all of the nodes of a particular Kubernetes version (in this case, `1.28`) with the following code snippet.

   ```
   K8S_VERSION=1.28
   for node in $(kubectl get nodes -o json | jq --arg K8S_VERSION "$K8S_VERSION" -r '.items[] | select(.status.nodeInfo.kubeletVersion | match("\($K8S_VERSION)")).metadata.name')
   do
       echo "Draining $node"
       kubectl drain $node --ignore-daemonsets --delete-emptydir-data
   done
   ```

1. You can use `nodeadm` to stop and remove the hybrid nodes artifacts from the host. You must run `nodeadm` with a user that has root/sudo privileges. By default, `nodeadm uninstall` will not proceed if there are pods remaining on the node. For more information see [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md).

   ```
   nodeadm uninstall
   ```

1. With the hybrid nodes artifacts stopped and uninstalled, remove the node resource from your cluster.

   ```
   kubectl delete node node-name
   ```

   You can identify and delete all of the nodes of a particular Kubernetes version (in this case, `1.28`) with the following code snippet.

   ```
   K8S_VERSION=1.28
   for node in $(kubectl get nodes -o json | jq --arg K8S_VERSION "$K8S_VERSION" -r '.items[] | select(.status.nodeInfo.kubeletVersion | match("\($K8S_VERSION)")).metadata.name')
   do
       echo "Deleting $node"
       kubectl delete node $node
   done
   ```

1. Depending on your choice of CNI, there may be artifacts remaining on your hybrid nodes after running the above steps. See [Configure CNI for hybrid nodes](hybrid-nodes-cni.md) for more information.

## In-place upgrades
<a name="hybrid-nodes-upgrade-inplace"></a>

The in-place upgrade process refers to using `nodeadm upgrade` to upgrade the Kubernetes version for hybrid nodes without using new physical or virtual hosts and a cutover migration strategy. The `nodeadm upgrade` process shuts down the existing older Kubernetes components running on the hybrid node, uninstalls the existing older Kubernetes components, installs the new target Kubernetes components, and starts the new target Kubernetes components. It is strongly recommend to upgrade one node at a time to minimize impact to applications running on the hybrid nodes. The duration of this process depends on your network bandwidth and latency.

1. Use the following command to mark the node you are upgrading as unschedulable. This is so that new pods aren’t scheduled or rescheduled on the node that you are upgrading. For more information, see [kubectl cordon](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_cordon/) in the Kubernetes documentation. Replace `NODE_NAME` with the name of the hybrid node you are upgrading

   ```
   kubectl cordon NODE_NAME
   ```

1. Drain the node you are upgrading with the following command. For more information on draining nodes, see [Safely Drain a Node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) in the Kubernetes documentation. Replace `NODE_NAME` with the name of the hybrid node you are upgrading.

   ```
   kubectl drain NODE_NAME --ignore-daemonsets --delete-emptydir-data
   ```

1. Run `nodeadm upgrade` on the hybrid node you are upgrading. You must run `nodeadm` with a user that has root/sudo privileges. The name of the node is preserved through upgrade for both AWS SSM and AWS IAM Roles Anywhere credential providers. You cannot change credentials providers during the upgrade process. See [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md) for configuration values for `nodeConfig.yaml`. Replace `K8S_VERSION` with the target Kubernetes version you upgrading to.

   ```
   nodeadm upgrade K8S_VERSION -c file://nodeConfig.yaml
   ```

1. To allow pods to be scheduled on the node after you have upgraded, type the following. Replace `NODE_NAME` with the name of the node.

   ```
   kubectl uncordon NODE_NAME
   ```

1. Watch the status of your hybrid nodes and wait for your nodes to shutdown and restart on the new Kubernetes version with the Ready status.

   ```
   kubectl get nodes -o wide -w
   ```

# Patch security updates for hybrid nodes
<a name="hybrid-nodes-security"></a>

This topic describes the procedure to perform in-place patching of security updates for specific packages and dependencies running on your hybrid nodes. As a best practice we recommend you to regularly update your hybrid nodes to receive CVEs and security patches.

For steps to upgrade the Kubernetes version, see [Upgrade hybrid nodes for your cluster](hybrid-nodes-upgrade.md).

One example of software that might need security patching is `containerd`.

## `Containerd`
<a name="_containerd"></a>

 `containerd` is the standard Kubernetes container runtime and core dependency for EKS Hybrid Nodes, used for managing container lifecycle, including pulling images and managing container execution. On an hybrid node, you can install `containerd` through the [nodeadm CLI](https://docs.aws.amazon.com/eks/latest/userguide/hybrid-nodes-nodeadm.html) or manually. Depending on the operating system of your node, `nodeadm` will install `containerd` from the OS-distributed package or Docker package.

When a CVE in `containerd` has been published, you have the following options to upgrade to the patched version of `containerd` on your Hybrid nodes.

## Step 1: Check if the patch published to package managers
<a name="_step_1_check_if_the_patch_published_to_package_managers"></a>

You can check whether the `containerd` CVE patch has been published to each respective OS package manager by referring to the corresponding security bulletins:
+  [Amazon Linux 2023](https://alas.aws.amazon.com/alas2023.html) 
+  [RHEL](https://access.redhat.com/security/security-updates/security-advisories) 
+  [Ubuntu 20.04](https://ubuntu.com/security/notices?order=newest&release=focal) 
+  [Ubuntu 22.04](https://ubuntu.com/security/notices?order=newest&release=jammy) 
+  [Ubuntu 24.04](https://ubuntu.com/security/notices?order=newest&release=noble) 

If you use the Docker repo as the source of `containerd`, you can check the [Docker security announcements](https://docs.docker.com/security/security-announcements/) to identify the availability of the patched version in the Docker repo.

## Step 2: Choose the method to install the patch
<a name="_step_2_choose_the_method_to_install_the_patch"></a>

There are three methods to patch and install security upgrades in-place on nodes. Which method you can use depends on whether the patch is available from the operating system in the package manager or not:

1. Install patches with `nodeadm upgrade` that are published to package managers, see [Step 2 a](#hybrid-nodes-security-nodeadm).

1. Install patches with the package managers directly, see [Step 2 b](#hybrid-nodes-security-package).

1. Install custom patches that aren’t published in package managers. Note that there are special considerations for custom patches for `containerd`, [Step 2 c](#hybrid-nodes-security-manual).

## Step 2 a: Patching with `nodeadm upgrade`
<a name="hybrid-nodes-security-nodeadm"></a>

After you confirm that the `containerd` CVE patch has been published to the OS or Docker repos (either Apt or RPM), you can use the `nodeadm upgrade` command to upgrade to the latest version of `containerd`. Since this isn’t a Kubernetes version upgrade, you must pass in your current Kubernetes version to the `nodeadm` upgrade command.

```
nodeadm upgrade K8S_VERSION --config-source file:///root/nodeConfig.yaml
```

## Step 2 b: Patching with operating system package managers
<a name="hybrid-nodes-security-package"></a>

Alternatively you can also update through the respective package manager and use it to upgrade the `containerd` package as follows.

 **Amazon Linux 2023** 

```
sudo yum update -y
sudo yum install -y containerd
```

 **RHEL** 

```
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://download.docker.com/linux/rhel/docker-ce.repo
sudo yum update -y
sudo yum install -y containerd
```

 **Ubuntu** 

```
sudo mkdir -p /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update -y
sudo apt install -y --only-upgrade containerd.io
```

## Step 2 c: `Containerd` CVE patch not published in package managers
<a name="hybrid-nodes-security-manual"></a>

If the patched `containerd` version is only available by other means instead of in the package manager, for example in GitHub releases, then you can install `containerd` from the official GitHub site.

1. If the machine has already joined the cluster as a hybrid node, then you need to run the `nodeadm uninstall` command.

1. Install the official `containerd` binaries. You can use the steps [official installation steps](https://github.com/containerd/containerd/blob/main/docs/getting-started.md#option-1-from-the-official-binaries) on GitHub.

1. Run the `nodeadm install` command with the `--containerd-source` argument set to `none`, which will skip `containerd` installation through `nodeadm`. You can use the value of `none` in the `containerd` source for any operating system that the node is running.

   ```
   nodeadm install K8S_VERSION --credential-provider CREDS_PROVIDER --containerd-source none
   ```

# Remove hybrid nodes
<a name="hybrid-nodes-remove"></a>

This topic describes how to delete hybrid nodes from your Amazon EKS cluster. You must delete your hybrid nodes with your choice of Kubernetes-compatible tooling such as [kubectl](https://kubernetes.io/docs/reference/kubectl/). Charges for hybrid nodes stop when the node object is removed from the Amazon EKS cluster. For more information on hybrid nodes pricing, see [Amazon EKS Pricing](https://aws.amazon.com/eks/pricing/).

**Important**  
Removing nodes is disruptive to workloads running on the node. Before deleting hybrid nodes, we recommend that you first drain the node to move pods to another active node. For more information on draining nodes, see [Safely Drain a Node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) in the Kubernetes documentation.

Run the kubectl steps below from your local machine or instance that you use to interact with the Amazon EKS cluster’s Kubernetes API endpoint. If you are using a specific `kubeconfig` file, use the `--kubeconfig` flag.

## Step 1: List your nodes
<a name="_step_1_list_your_nodes"></a>

```
kubectl get nodes
```

## Step 2: Drain your node
<a name="_step_2_drain_your_node"></a>

See [kubectl drain](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_drain/) in the Kubernetes documentation for more information on the `kubectl drain` command.

```
kubectl drain --ignore-daemonsets <node-name>
```

## Step 3: Stop and uninstall hybrid nodes artifacts
<a name="_step_3_stop_and_uninstall_hybrid_nodes_artifacts"></a>

You can use the Amazon EKS Hybrid Nodes CLI (`nodeadm`) to stop and remove the hybrid nodes artifacts from the host. You must run `nodeadm` with a user that has root/sudo privileges. By default, `nodeadm uninstall` will not proceed if there are pods remaining on the node. If you are using AWS Systems Manager (SSM) as your credentials provider, the `nodeadm uninstall` command deregisters the host as an AWS SSM managed instance. For more information, see [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md).

```
nodeadm uninstall
```

## Step 4: Delete your node from the cluster
<a name="_step_4_delete_your_node_from_the_cluster"></a>

With the hybrid nodes artifacts stopped and uninstalled, remove the node resource from your cluster.

```
kubectl delete node <node-name>
```

## Step 5: Check for remaining artifacts
<a name="_step_5_check_for_remaining_artifacts"></a>

Depending on your choice of CNI, there may be artifacts remaining on your hybrid nodes after running the above steps. See [Configure CNI for hybrid nodes](hybrid-nodes-cni.md) for more information.

# Configure application networking, add-ons, and webhooks for hybrid nodes
<a name="hybrid-nodes-configure"></a>

After you create an EKS cluster for hybrid nodes, configure additional capabilities for application networking (CNI, BGP, Ingress, Load Balancing, Network Policies), add-ons, webhooks, and proxy settings. For the complete list of the EKS and community add-ons that are compatible with hybrid nodes, see [Configure add-ons for hybrid nodes](hybrid-nodes-add-ons.md).

 **EKS cluster insights** EKS includes insight checks for misconfigurations in your hybrid nodes setup that could impair functionality of your cluster or workloads. For more information on cluster insights, see [Prepare for Kubernetes version upgrades and troubleshoot misconfigurations with cluster insights](cluster-insights.md).

The following lists the common capabilities and add-ons that you can use with hybrid nodes:
+  **Container Networking Interface (CNI)**: AWS supports [Cilium](https://docs.cilium.io/en/stable/index.html) as the CNI for hybrid nodes. For more information, see [Configure CNI for hybrid nodes](hybrid-nodes-cni.md). Note that the AWS VPC CNI can’t be used with hybrid nodes.
+  **CoreDNS and `kube-proxy` **: CoreDNS and `kube-proxy` are installed automatically when hybrid nodes join the EKS cluster. These add-ons can be managed as EKS add-ons after cluster creation.
+  **Ingress and Load Balancing**: You can use the AWS Load Balancer Controller and Application Load Balancer (ALB) or Network Load Balancer (NLB) with the target type `ip` for workloads running on hybrid nodes. AWS supports Cilium’s built-in Ingress, Gateway, and Kubernetes Service load balancing features for workloads running on hybrid nodes. For more information, see [Configure Kubernetes Ingress for hybrid nodes](hybrid-nodes-ingress.md) and [Configure Services of type LoadBalancer for hybrid nodes](hybrid-nodes-load-balancing.md).
+  **Metrics**: You can use Amazon Managed Service for Prometheus (AMP) agent-less scrapers, AWS Distro for Open Telemetry (ADOT), and the Amazon CloudWatch Observability Agent with hybrid nodes. To use AMP agent-less scrapers for pod metrics on hybrid nodes, your pods must be accessible from the VPC that you use for the EKS cluster.
+  **Logs**: You can enable EKS control plane logging for hybrid nodes-enabled clusters. You can use the ADOT EKS add-on and the Amazon CloudWatch Observability Agent EKS add-on for hybrid node and pod logging.
+  **Pod Identities and IRSA**: You can use EKS Pod Identities and IAM Roles for Service Accounts (IRSA) with applications running on hybrid nodes to enable granular access for your pods running on hybrid nodes with other AWS services.
+  **Webhooks**: If you are running webhooks, see [Configure webhooks for hybrid nodes](hybrid-nodes-webhooks.md) for considerations and steps to optionally run webhooks on cloud nodes if you cannot make your on-premises pod networks routable.
+  **Proxy**: If you are using a proxy server in your on-premises environment for traffic leaving your data center or edge environment, you can configure your hybrid nodes and cluster to use your proxy server. For more information, see [Configure proxy for hybrid nodes](hybrid-nodes-proxy.md).

**Topics**
+ [

# Configure CNI for hybrid nodes
](hybrid-nodes-cni.md)
+ [

# Configure add-ons for hybrid nodes
](hybrid-nodes-add-ons.md)
+ [

# Configure webhooks for hybrid nodes
](hybrid-nodes-webhooks.md)
+ [

# Configure proxy for hybrid nodes
](hybrid-nodes-proxy.md)
+ [

# Configure Cilium BGP for hybrid nodes
](hybrid-nodes-cilium-bgp.md)
+ [

# Configure Kubernetes Ingress for hybrid nodes
](hybrid-nodes-ingress.md)
+ [

# Configure Services of type LoadBalancer for hybrid nodes
](hybrid-nodes-load-balancing.md)
+ [

# Configure Kubernetes Network Policies for hybrid nodes
](hybrid-nodes-network-policies.md)

# Configure CNI for hybrid nodes
<a name="hybrid-nodes-cni"></a>

Cilium is the AWS-supported Container Networking Interface (CNI) for Amazon EKS Hybrid Nodes. You must install a CNI for hybrid nodes to become ready to serve workloads. Hybrid nodes appear with status `Not Ready` until a CNI is running. You can manage the CNI with your choice of tools such as Helm. The instructions on this page cover Cilium lifecycle management (install, upgrade, delete). See [Cilium Ingress and Cilium Gateway Overview](hybrid-nodes-ingress.md#hybrid-nodes-ingress-cilium), [Service type LoadBalancer](hybrid-nodes-ingress.md#hybrid-nodes-ingress-cilium-loadbalancer), and [Configure Kubernetes Network Policies for hybrid nodes](hybrid-nodes-network-policies.md) for how to configure Cilium for ingress, load balancing, and network policies.

Cilium is not supported by AWS when running on nodes in AWS Cloud. The Amazon VPC CNI is not compatible with hybrid nodes and the VPC CNI is configured with anti-affinity for the `eks.amazonaws.com/compute-type: hybrid` label.

The Calico documentation previously on this page has been moved to the [EKS Hybrid Examples Repository](https://github.com/aws-samples/eks-hybrid-examples).

## Version compatibility
<a name="hybrid-nodes-cilium-version-compatibility"></a>

Cilium versions `v1.17.x` and `v1.18.x` are supported for EKS Hybrid Nodes for every Kubernetes version supported in Amazon EKS.

**Note**  
 **Cilium v1.18.3 kernel requirement**: Due to the kernel requirement (Linux kernel >= 5.10), Cilium v1.18.3 is not supported on:
+ Ubuntu 20.04
+ Red Hat Enterprise Linux (RHEL) 8

For system requirements, see [Cilium system requirements](https://docs.cilium.io/en/stable/operations/system_requirements/).

See [Kubernetes version support](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html) for the Kubernetes versions supported by Amazon EKS. EKS Hybrid Nodes have the same Kubernetes version support as Amazon EKS clusters with cloud nodes.

## Supported capabilities
<a name="hybrid-nodes-cilium-support"></a>

 AWS maintains builds of Cilium for EKS Hybrid Nodes that are based on the open source [Cilium project](https://github.com/cilium/cilium). To receive support from AWS for Cilium, you must be using the AWS-maintained Cilium builds and supported Cilium versions.

 AWS provides technical support for the default configurations of the following capabilities of Cilium for use with EKS Hybrid Nodes. If you plan to use functionality outside the scope of AWS support, it is recommended to obtain alternative commercial support for Cilium or have the in-house expertise to troubleshoot and contribute fixes to the Cilium project.


| Cilium Feature | Supported by AWS  | 
| --- | --- | 
|  Kubernetes network conformance  |  Yes  | 
|  Core cluster connectivity  |  Yes  | 
|  IP family  |  IPv4  | 
|  Lifecycle Management  |  Helm  | 
|  Networking Mode  |  VXLAN encapsulation  | 
|  IP Address Management (IPAM)  |  Cilium IPAM Cluster Scope  | 
|  Network Policy  |  Kubernetes Network Policy  | 
|  Border Gateway Protocol (BGP)  |  Cilium BGP Control Plane  | 
|  Kubernetes Ingress  |  Cilium Ingress, Cilium Gateway  | 
|  Service LoadBalancer IP Allocation  |  Cilium Load Balancer IPAM  | 
|  Service LoadBalancer IP Address Advertisement  |  Cilium BGP Control Plane  | 
|  kube-proxy replacement  |  Yes  | 

## Cilium considerations
<a name="hybrid-nodes-cilium-considerations"></a>
+  **Helm repository** - AWS hosts the Cilium Helm chart in the Amazon Elastic Container Registry Public (Amazon ECR Public) at [Amazon EKS Cilium/Cilium](https://gallery.ecr.aws/eks/cilium/cilium). The available versions include:
  + Cilium v1.17.9: `oci://public.ecr.aws/eks/cilium/cilium:1.17.9-0` 
  + Cilium v1.18.3: `oci://public.ecr.aws/eks/cilium/cilium:1.18.3-0` 

    The commands in this topic use this repository. Note that certain `helm repo` commands aren’t valid for Helm repositores in Amazon ECR Public, so you can’t refer to this repository from a local Helm repo name. Instead, use the full URI in most commands.
+ By default, Cilium is configured to run in overlay / tunnel mode with VXLAN as the [encapsulation method](https://docs.cilium.io/en/stable/network/concepts/routing/#encapsulation). This mode has the fewest requirements on the underlying physical network.
+ By default, Cilium [masquerades](https://docs.cilium.io/en/stable/network/concepts/masquerading/) the source IP address of all pod traffic leaving the cluster to the IP address of the node. If you disable masquerading, then your pod CIDRs must be routable on your on-premises network.
+ If you are running webhooks on hybrid nodes, your pod CIDRs must be routable on your on-premises network. If your pod CIDRs are not routable on your on-premises network, then it is recommended to run webhooks on cloud nodes in the same cluster. See [Configure webhooks for hybrid nodes](hybrid-nodes-webhooks.md) and [Prepare networking for hybrid nodes](hybrid-nodes-networking.md) for more information.
+  AWS recommends using Cilium’s built-in BGP functionality to make your pod CIDRs routable on your on-premises network. For more information on how to configure Cilium BGP with hybrid nodes, see [Configure Cilium BGP for hybrid nodes](hybrid-nodes-cilium-bgp.md).
+ The default IP Address Management (IPAM) in Cilium is called [Cluster Scope](https://docs.cilium.io/en/stable/network/concepts/ipam/cluster-pool/), where the Cilium operator allocates IP addresses for each node based on user-configured pod CIDRs.

## Install Cilium on hybrid nodes
<a name="hybrid-nodes-cilium-install"></a>

### Procedure
<a name="_procedure"></a>

1. Create a YAML file called `cilium-values.yaml`. The following example configures Cilium to run only on hybrid nodes by setting affinity for the `eks.amazonaws.com/compute-type: hybrid` label for the Cilium agent and operator.
   + Configure `clusterPoolIpv4PodCIDRList` with the same pod CIDRs you configured for your EKS cluster’s *remote pod networks*. For example, `10.100.0.0/24`. The Cilium operator allocates IP address slices from within the configured `clusterPoolIpv4PodCIDRList` IP space. Your pod CIDR must not overlap with your on-premises node CIDR, your VPC CIDR, or your Kubernetes service CIDR.
   + Configure `clusterPoolIpv4MaskSize` based on your required pods per node. For example, `25` for a /25 segment size of 128 pods per node.
   + Do not change `clusterPoolIpv4PodCIDRList` or `clusterPoolIpv4MaskSize` after deploying Cilium on your cluster, see [Expanding the cluster pool](https://docs.cilium.io/en/stable/network/concepts/ipam/cluster-pool/#expanding-the-cluster-pool) for more information.
   + If you are running Cilium in kube-proxy replacement mode, set `kubeProxyReplacement: "true"` in your Helm values and ensure you do not have an existing kube-proxy deployment running on the same nodes as Cilium.
   + The example below disables the Envoy Layer 7 (L7) proxy that Cilium uses for L7 network policies and ingress. For more information, see [Configure Kubernetes Network Policies for hybrid nodes](hybrid-nodes-network-policies.md) and [Cilium Ingress and Cilium Gateway Overview](hybrid-nodes-ingress.md#hybrid-nodes-ingress-cilium).
   + The example below configures `loadBalancer.serviceTopology`: `true` for Service Traffic Distribution to function correctly if you configure it for your services. For more information, see [Configure Service Traffic Distribution](hybrid-nodes-webhooks.md#hybrid-nodes-mixed-service-traffic-distribution).
   + For a full list of Helm values for Cilium, see the [Helm reference](https://docs.cilium.io/en/stable/helm-reference/) in the Cilium documentation.

     ```
     affinity:
       nodeAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
           nodeSelectorTerms:
           - matchExpressions:
             - key: eks.amazonaws.com/compute-type
               operator: In
               values:
               - hybrid
     ipam:
       mode: cluster-pool
       operator:
         clusterPoolIPv4MaskSize: 25
         clusterPoolIPv4PodCIDRList:
         - POD_CIDR
     loadBalancer:
       serviceTopology: true
     operator:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: eks.amazonaws.com/compute-type
                 operator: In
                 values:
                   - hybrid
       unmanagedPodWatcher:
         restart: false
     loadBalancer:
       serviceTopology: true
     envoy:
       enabled: false
     kubeProxyReplacement: "false"
     ```

1. Install Cilium on your cluster.
   + Replace `CILIUM_VERSION` with a Cilium version (for example `1.17.9-0` or `1.18.3-0`). It is recommended to use the latest patch version for the Cilium minor version.
   + Ensure your nodes meet the kernel requirements for the version you choose. Cilium v1.18.3 requires Linux kernel >= 5.10.
   + If you are using a specific kubeconfig file, use the `--kubeconfig` flag with the Helm install command.

     ```
     helm install cilium oci://public.ecr.aws/eks/cilium/cilium \
         --version CILIUM_VERSION \
         --namespace kube-system \
         --values cilium-values.yaml
     ```

1. Confirm your Cilium installation was successful with the following commands. You should see the `cilium-operator` deployment and the `cilium-agent` running on each of your hybrid nodes. Additionally, your hybrid nodes should now have status `Ready`. For information on how to configure Cilium BGP to advertise your pod CIDRs to your on-premises network, proceed to [Configure Cilium BGP for hybrid nodes](hybrid-nodes-cilium-bgp.md).

   ```
   kubectl get pods -n kube-system
   ```

   ```
   NAME                              READY   STATUS    RESTARTS   AGE
   cilium-jjjn8                      1/1     Running   0          11m
   cilium-operator-d4f4d7fcb-sc5xn   1/1     Running   0          11m
   ```

   ```
   kubectl get nodes
   ```

   ```
   NAME                   STATUS   ROLES    AGE   VERSION
   mi-04a2cf999b7112233   Ready    <none>   19m   v1.31.0-eks-a737599
   ```

## Upgrade Cilium on hybrid nodes
<a name="hybrid-nodes-cilium-upgrade"></a>

Before upgrading your Cilium deployment, carefully review the [Cilium upgrade documentation](https://docs.cilium.io/en/v1.17/operations/upgrade/) and the upgrade notes to understand the changes in the target Cilium version.

1. Ensure that you have installed the `helm` CLI on your command-line environment. See the [Helm documentation](https://helm.sh/docs/intro/quickstart/) for installation instructions.

1. Run the Cilium upgrade pre-flight check. Replace `CILIUM_VERSION` with your target Cilium version. We recommend that you run the latest patch version for your Cilium minor version. You can find the latest patch release for a given minor Cilium release in the [Stable Releases section](https://github.com/cilium/cilium#stable-releases) of the Cilium documentation.

   ```
   helm install cilium-preflight oci://public.ecr.aws/eks/cilium/cilium --version CILIUM_VERSION \
     --namespace=kube-system \
     --set preflight.enabled=true \
     --set agent=false \
     --set operator.enabled=false
   ```

1. After applying the `cilium-preflight.yaml`, ensure that the number of `READY` pods is the same number of Cilium pods running.

   ```
   kubectl get ds -n kube-system | sed -n '1p;/cilium/p'
   ```

   ```
   NAME                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
   cilium                    2         2         2       2            2           <none>          1h20m
   cilium-pre-flight-check   2         2         2       2            2           <none>          7m15s
   ```

1. Once the number of READY pods are equal, make sure the Cilium pre-flight deployment is also marked as READY 1/1. If it shows READY 0/1, consult the [CNP Validation](https://docs.cilium.io/en/v1.17/operations/upgrade/#cnp-validation) section and resolve issues with the deployment before continuing with the upgrade.

   ```
   kubectl get deployment -n kube-system cilium-pre-flight-check -w
   ```

   ```
   NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
   cilium-pre-flight-check   1/1     1            0           12s
   ```

1. Delete the preflight

   ```
   helm uninstall cilium-preflight --namespace kube-system
   ```

1. Before running the `helm upgrade` command, preserve the values for your deployment in a `existing-cilium-values.yaml` or use `--set` command line options for your settings when you run the upgrade command. The upgrade operation overwrites the Cilium ConfigMap, so it is critical that your configuration values are passed when you upgrade.

   ```
   helm get values cilium --namespace kube-system -o yaml > existing-cilium-values.yaml
   ```

1. During normal cluster operations, all Cilium components should run the same version. The following steps describe how to upgrade all of the components from one stable release to a later stable release. When upgrading from one minor release to another minor release, it is recommended to upgrade to the latest patch release for the existing Cilium minor version first. To minimize disruption, set the `upgradeCompatibility` option to the initial Cilium version that you installed in this cluster.

   ```
   helm upgrade cilium oci://public.ecr.aws/eks/cilium/cilium --version CILIUM_VERSION \
     --namespace kube-system \
     --set upgradeCompatibility=1.X \
     -f existing-cilium-values.yaml
   ```

1. (Optional) If you need to rollback your upgrade due to issues, run the following commands.

   ```
   helm history cilium --namespace kube-system
   helm rollback cilium [REVISION] --namespace kube-system
   ```

## Delete Cilium from hybrid nodes
<a name="hybrid-nodes-cilium-delete"></a>

1. Run the following command to uninstall all Cilium components from your cluster. Note, uninstalling the CNI might impact the health of nodes and pods and shouldn’t be performed on production clusters.

   ```
   helm uninstall cilium --namespace kube-system
   ```

   The interfaces and routes configured by Cilium are not removed by default when the CNI is removed from the cluster, see the [GitHub issue](https://github.com/cilium/cilium/issues/34289) for more information.

1. To clean up the on-disk configuration files and resources, if you are using the standard configuration directories, you can remove the files as shown by the [`cni-uninstall.sh` script](https://github.com/cilium/cilium/blob/main/plugins/cilium-cni/cni-uninstall.sh) in the Cilium repository on GitHub.

1. To remove the Cilium Custom Resource Definitions (CRDs) from your cluster, you can run the following commands.

   ```
   kubectl get crds -oname | grep "cilium" | xargs kubectl delete
   ```

# Configure add-ons for hybrid nodes
<a name="hybrid-nodes-add-ons"></a>

This page describes considerations for running AWS add-ons and community add-ons on Amazon EKS Hybrid Nodes. To learn more about Amazon EKS add-ons and the processes for creating, upgrading, and removing add-ons from your cluster, see [Amazon EKS add-ons](eks-add-ons.md). Unless otherwise noted on this page, the processes for creating, upgrading, and removing Amazon EKS add-ons is the same for Amazon EKS clusters with hybrid nodes as it is for Amazon EKS clusters with nodes running in AWS Cloud. Only the add-ons included on this page have been validated for compatibility with Amazon EKS Hybrid Nodes.

The following AWS add-ons are compatible with Amazon EKS Hybrid Nodes.


|  AWS add-on | Compatible add-on versions | 
| --- | --- | 
|  kube-proxy  |  v1.25.14-eksbuild.2 and above  | 
|  CoreDNS  |  v1.9.3-eksbuild.7 and above  | 
|   AWS Distro for OpenTelemetry (ADOT)  |  v0.102.1-eksbuild.2 and above  | 
|  CloudWatch Observability agent  |  v2.2.1-eksbuild.1 and above  | 
|  EKS Pod Identity Agent  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/eks/latest/userguide/hybrid-nodes-add-ons.html)  | 
|  Node monitoring agent  |  v1.2.0-eksbuild.1 and above  | 
|  CSI snapshot controller  |  v8.1.0-eksbuild.1 and above  | 
|   AWS Private CA Connector for Kubernetes  |  v1.6.0-eksbuild.1 and above  | 
|  Amazon FSx CSI driver  |  v1.7.0-eksbuild.1 and above  | 
|   AWS Secrets Store CSI Driver provider  |  v2.1.1-eksbuild.1 and above  | 

The following community add-ons are compatible with Amazon EKS Hybrid Nodes. To learn more about community add-ons, see [Community add-ons](community-addons.md).


| Community add-on | Compatible add-on versions | 
| --- | --- | 
|  Kubernetes Metrics Server  |  v0.7.2-eksbuild.1 and above  | 
|  cert-manager  |  v1.17.2-eksbuild.1 and above  | 
|  Prometheus Node Exporter  |  v1.9.1-eksbuild.2 and above  | 
|  kube-state-metrics  |  v2.15.0-eksbuild.4 and above  | 
|  External DNS  |  v0.19.0-eksbuild.1 and above  | 

In addition to the Amazon EKS add-ons in the tables above, the [Amazon Managed Service for Prometheus Collector](prometheus.md), and the [AWS Load Balancer Controller](aws-load-balancer-controller.md) for [application ingress](alb-ingress.md) (HTTP) and [load balancing](network-load-balancing.md) (TCP/UDP) are compatible with hybrid nodes.

There are AWS add-ons and community add-ons that aren’t compatible with Amazon EKS Hybrid Nodes. The latest versions of these add-ons have an anti-affinity rule for the default `eks.amazonaws.com/compute-type: hybrid` label applied to hybrid nodes. This prevents them from running on hybrid nodes when deployed in your clusters. If you have clusters with both hybrid nodes and nodes running in AWS Cloud, you can deploy these add-ons in your cluster to nodes running in AWS Cloud. The Amazon VPC CNI is not compatible with hybrid nodes, and Cilium and Calico are supported as the Container Networking Interfaces (CNIs) for Amazon EKS Hybrid Nodes. See [Configure CNI for hybrid nodes](hybrid-nodes-cni.md) for more information.

## AWS add-ons
<a name="hybrid-nodes-add-ons-aws-add-ons"></a>

The sections that follow describe differences between running compatible AWS add-ons on hybrid nodes compared to other Amazon EKS compute types.

## kube-proxy and CoreDNS
<a name="hybrid-nodes-add-ons-core"></a>

EKS installs kube-proxy and CoreDNS as self-managed add-ons by default when you create an EKS cluster with the AWS API and AWS SDKs, including from the AWS CLI. You can overwrite these add-ons with Amazon EKS add-ons after cluster creation. Reference the EKS documentation for details on [Manage `kube-proxy` in Amazon EKS clusters](managing-kube-proxy.md) and [Manage CoreDNS for DNS in Amazon EKS clusters](managing-coredns.md). If you are running a mixed mode cluster with both hybrid nodes and nodes in AWS Cloud, AWS recommends to have at least one CoreDNS replica on hybrid nodes and at least one CoreDNS replica on your nodes in AWS Cloud. See [Configure CoreDNS replicas](hybrid-nodes-webhooks.md#hybrid-nodes-mixed-coredns) for configuration steps.

## CloudWatch Observability agent
<a name="hybrid-nodes-add-ons-cw"></a>

The CloudWatch Observability agent operator uses [webhooks](https://kubernetes.io/docs/reference/access-authn-authz/webhook/). If you run the operator on hybrid nodes, your on-premises pod CIDR must be routable on your on-premises network and you must configure your EKS cluster with your remote pod network. For more information, see [Configure webhooks for hybrid nodes](hybrid-nodes-webhooks.md).

Node-level metrics are not available for hybrid nodes because [CloudWatch Container Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html) depends on the availability of [Instance Metadata Service](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html) (IMDS) for node-level metrics. Cluster, workload, pod, and container-level metrics are available for hybrid nodes.

After installing the add-on by following the steps described in [Install the CloudWatch agent with the Amazon CloudWatch Observability](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Observability-EKS-addon.html), the add-on manifest must be updated before the agent can run successfully on hybrid nodes. Edit the `amazoncloudwatchagents` resource on the cluster to add the `RUN_WITH_IRSA` environment variable as shown below.

```
kubectl edit amazoncloudwatchagents -n amazon-cloudwatch cloudwatch-agent
```

```
apiVersion: v1
items:
- apiVersion: cloudwatch.aws.amazon.com/v1alpha1
  kind: AmazonCloudWatchAgent
  metadata:
    ...
    name: cloudwatch-agent
    namespace: amazon-cloudwatch
    ...
  spec:
    ...
    env:
    - name: RUN_WITH_IRSA # <-- Add this
      value: "True" # <-- Add this
    - name: K8S_NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName
          ...
```

## Amazon Managed Prometheus managed collector for hybrid nodes
<a name="hybrid-nodes-add-ons-amp"></a>

An Amazon Managed Service for Prometheus (AMP) managed collector consists of a scraper that discovers and collects metrics from the resources in an Amazon EKS cluster. AMP manages the scraper for you, removing the need to manage any instances, agents, or scrapers yourself.

You can use AMP managed collectors without any additional configuration specific to hybrid nodes. However the metric endpoints for your applications on the hybrid nodes must be reachable from the VPC, including routes from the VPC to remote pod network CIDRs and the ports open in your on-premises firewall. Additionally, your cluster must have [private cluster endpoint access](cluster-endpoint.md).

Follow the steps in [Using an AWS managed collector](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-collector-how-to.html) in the Amazon Managed Service for Prometheus User Guide.

## AWS Distro for OpenTelemetry (ADOT)
<a name="hybrid-nodes-add-ons-adot"></a>

You can use the AWS Distro for OpenTelemetry (ADOT) add-on to collect metrics, logs, and tracing data from your applications running on hybrid nodes. ADOT uses admission [webhooks](https://kubernetes.io/docs/reference/access-authn-authz/webhook/) to mutate and validate the Collector Custom Resource requests. If you run the ADOT operator on hybrid nodes, your on-premises pod CIDR must be routable on your on-premises network and you must configure your EKS cluster with your remote pod network. For more information, see [Configure webhooks for hybrid nodes](hybrid-nodes-webhooks.md).

Follow the steps in [Getting Started with AWS Distro for OpenTelemetry using EKS Add-Ons](https://aws-otel.github.io/docs/getting-started/adot-eks-add-on) in the * AWS Distro for OpenTelemetry* documentation.

## AWS Load Balancer Controller
<a name="hybrid-nodes-add-ons-lbc"></a>

You can use the [AWS Load Balancer Controller](aws-load-balancer-controller.md) and Application Load Balancer (ALB) or Network Load Balancer (NLB) with the target type `ip` for workloads on hybrid nodes The IP target(s) used with the ALB or NLB must be routable from AWS. The AWS Load Balancer controller also uses [webhooks](https://kubernetes.io/docs/reference/access-authn-authz/webhook/). If you run the AWS Load Balancer Controller operator on hybrid nodes, your on-premises pod CIDR must be routable on your on-premises network and you must configure your EKS cluster with your remote pod network. For more information, see [Configure webhooks for hybrid nodes](hybrid-nodes-webhooks.md).

To install the AWS Load Balancer Controller, follow the steps at [AWS Application Load Balancer](hybrid-nodes-ingress.md#hybrid-nodes-ingress-alb) or [AWS Network Load Balancer](hybrid-nodes-load-balancing.md#hybrid-nodes-service-lb-nlb).

For ingress with ALB, you must specify the annotations below. See [Route application and HTTP traffic with Application Load Balancers](alb-ingress.md) for more information.

```
alb.ingress.kubernetes.io/target-type: ip
```

For load balancing with NLB, you must specify the annotations below. See [Route TCP and UDP traffic with Network Load Balancers](network-load-balancing.md) for more information.

```
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
```

## EKS Pod Identity Agent
<a name="hybrid-nodes-add-ons-pod-id"></a>

**Note**  
To successfully deploy the EKS Pod Identity Agent add-on on hybrid nodes running Bottlerocket, ensure your Bottlerocket version is at least v1.39.0. The Pod Identity Agent is not supported on earlier Bottlerocket versions in hybrid node environments.

The original Amazon EKS Pod Identity Agent DaemonSet relies on the availability of EC2 IMDS on the node to obtain the required AWS credentials. As IMDS isn’t available on hybrid nodes, starting with version 1.3.3-eksbuild.1, the Pod Identity Agent add-on optionally deploys a DaemonSet that mounts the required credentials. Hybrid nodes running Bottlerocket require a different method to mount the credentials, and starting in version 1.3.7-eksbuild.2, the Pod Identity Agent add-on optionally deploys a DaemonSet that specifically targets Bottlerocket hybrid nodes. The following sections describe the process for enabling the optional DaemonSets.

### Ubuntu/RHEL/AL2023
<a name="_ubunturhelal2023"></a>

1. To use the Pod Identity agent on Ubuntu/RHEL/Al2023 hybrid nodes, set `enableCredentialsFile: true` in the hybrid section of `nodeadm` config as shown below:

   ```
   apiVersion: node.eks.aws/v1alpha1
   kind: NodeConfig
   spec:
       hybrid:
           enableCredentialsFile: true # <-- Add this
   ```

   This will configure `nodeadm` to create a credentials file to be configured on the node under `/eks-hybrid/.aws/credentials`, which will be used by `eks-pod-identity-agent` pods. This credentials file will contain temporary AWS credentials that will be refreshed periodically.

1. After you update the `nodeadm` config on *each* node, run the following `nodeadm init` command with your `nodeConfig.yaml` to join your hybrid nodes to your Amazon EKS cluster. If your nodes have joined the cluster previous, still run the `nodeadm init` command again.

   ```
   nodeadm init -c file://nodeConfig.yaml
   ```

1. Install `eks-pod-identity-agent` with support for hybrid nodes enabled, by using either the AWS CLI or AWS Management Console.

   1.  AWS CLI: From the machine that you’re using to administer the cluster, run the following command to install `eks-pod-identity-agent` with support for hybrid nodes enabled. Replace `my-cluster` with the name of your cluster.

      ```
      aws eks create-addon \
          --cluster-name my-cluster \
          --addon-name eks-pod-identity-agent \
          --configuration-values '{"daemonsets":{"hybrid":{"create": true}}}'
      ```

   1.  AWS Management Console: If you are installing the Pod Identity Agent add-on through the AWS console, add the following to the optional configuration to deploy the DaemonSet that targets hybrid nodes.

      ```
      {"daemonsets":{"hybrid":{"create": true}}}
      ```

### Bottlerocket
<a name="_bottlerocket"></a>

1. To use the Pod Identity agent on Bottlerocket hybrid nodes, add the `--enable-credentials-file=true` flag to the command used for the Bottlerocket bootstrap container user data, as described in [Connect hybrid nodes with Bottlerocket](hybrid-nodes-bottlerocket.md).

   1. If you are using the SSM credential provider, your command should look like this:

      ```
      eks-hybrid-ssm-setup --activation-id=<activation-id> --activation-code=<activation-code> --region=<region> --enable-credentials-file=true
      ```

   1. If you are using the IAM Roles Anywhere credential provider, your command should look like this:

      ```
      eks-hybrid-iam-ra-setup --certificate=<certificate> --key=<private-key> --enable-credentials-file=true
      ```

      This will configure the bootstrap script to create a credentials file on the node under `/var/eks-hybrid/.aws/credentials`, which will be used by `eks-pod-identity-agent` pods. This credentials file will contain temporary AWS credentials that will be refreshed periodically.

1. Install `eks-pod-identity-agent` with support for Bottlerocket hybrid nodes enabled, by using either the AWS CLI or AWS Management Console.

   1.  AWS CLI: From the machine that you’re using to administer the cluster, run the following command to install `eks-pod-identity-agent` with support for Bottlerocket hybrid nodes enabled. Replace `my-cluster` with the name of your cluster.

      ```
      aws eks create-addon \
          --cluster-name my-cluster \
          --addon-name eks-pod-identity-agent \
          --configuration-values '{"daemonsets":{"hybrid-bottlerocket":{"create": true}}}'
      ```

   1.  AWS Management Console: If you are installing the Pod Identity Agent add-on through the AWS console, add the following to the optional configuration to deploy the DaemonSet that targets Bottlerocket hybrid nodes.

      ```
      {"daemonsets":{"hybrid-bottlerocket":{"create": true}}}
      ```

## CSI snapshot controller
<a name="hybrid-nodes-add-ons-csi-snapshotter"></a>

Starting with version `v8.1.0-eksbuild.2`, the [CSI snapshot controller add-on](csi-snapshot-controller.md) applies a soft anti-affinity rule for hybrid nodes, preferring the controller `deployment` to run on EC2 in the same AWS Region as the Amazon EKS control plane. Co-locating the `deployment` in the same AWS Region as the Amazon EKS control plane improves latency.

## Community add-ons
<a name="hybrid-nodes-add-ons-community"></a>

The sections that follow describe differences between running compatible community add-ons on hybrid nodes compared to other Amazon EKS compute types.

## Kubernetes Metrics Server
<a name="hybrid-nodes-add-ons-metrics-server"></a>

The control plane needs to reach Metrics Server’s pod IP (or node IP if hostNetwork is enabled). Therefore, unless you run Metrics Server in hostNetwork mode, you must configure a remote pod network when creating your Amazon EKS cluster, and you must make your pod IP addresses routable. Implementing Border Gateway Protocol (BGP) with the CNI is one common way to make your pod IP addresses routable.

## cert-manager
<a name="hybrid-nodes-add-ons-cert-manager"></a>

 `cert-manager` uses [webhooks](https://kubernetes.io/docs/reference/access-authn-authz/webhook/). If you run `cert-manager` on hybrid nodes, your on-premises pod CIDR must be routable on your on-premises network and you must configure your EKS cluster with your remote pod network. For more information, see [Configure webhooks for hybrid nodes](hybrid-nodes-webhooks.md).

# Configure webhooks for hybrid nodes
<a name="hybrid-nodes-webhooks"></a>

This page details considerations for running webhooks with hybrid nodes. Webhooks are used in Kubernetes applications and open source projects, such as the AWS Load Balancer Controller and CloudWatch Observability Agent, to perform mutating and validation capabilities at runtime.

 **Routable pod networks** 

If you are able to make your on-premises pod CIDR routable on your on-premises network, you can run webhooks on hybrid nodes. There are several techniques you can use to make your on-premises pod CIDR routable on your on-premises network including Border Gateway Protocol (BGP), static routes, or other custom routing solutions. BGP is the recommended solution as it is more scalable and easier to manage than alternative solutions that require custom or manual route configuration. AWS supports the BGP capabilities of Cilium and Calico for advertising pod CIDRs, see [Configure CNI for hybrid nodes](hybrid-nodes-cni.md) and [Routable remote Pod CIDRs](hybrid-nodes-concepts-kubernetes.md#hybrid-nodes-concepts-k8s-pod-cidrs) for more information.

 **Unroutable pod networks** 

If you *cannot* make your on-premises pod CIDR routable on your on-premises network and need to run webhooks, we recommend that you run all webhooks on cloud nodes in the same EKS cluster as your hybrid nodes.

## Considerations for mixed mode clusters
<a name="hybrid-nodes-considerations-mixed-mode"></a>

 *Mixed mode clusters* are defined as EKS clusters that have both hybrid nodes and nodes running in AWS Cloud. When running a mixed mode cluster, consider the following recommendations:
+ Run the VPC CNI on nodes in AWS Cloud and either Cilium or Calico on hybrid nodes. Cilium and Calico are not supported by AWS when running on nodes in AWS Cloud.
+ Configure webhooks to run on nodes in AWS Cloud. See [Configure webhooks for add-ons](#hybrid-nodes-webhooks-add-ons) for how to configure the webhooks for AWS and community add-ons.
+ If your applications require pods running on nodes in AWS Cloud to directly communicate with pods running on hybrid nodes ("east-west communication"), and you are using the VPC CNI on nodes in AWS Cloud, and Cilium or Calico on hybrid nodes, then your on-premises pod CIDR must be routable on your on-premises network.
+ Run at least one replica of CoreDNS on nodes in AWS Cloud and at least one replica of CoreDNS on hybrid nodes.
+ Configure Service Traffic Distribution to keep Service traffic local to the zone it is originating from. For more information on Service Traffic Distribution, see [Configure Service Traffic Distribution](#hybrid-nodes-mixed-service-traffic-distribution).
+ If you are using AWS Application Load Balancers (ALB) or Network Load Balancers (NLB) for workload traffic running on hybrid nodes, then the IP target(s) used with the ALB or NLB must be routable from AWS.
+ The Metrics Server add-on requires connectivity from the EKS control plane to the Metrics Server pod IP address. If you are running the Metrics Server add-on on hybrid nodes, then your on-premises pod CIDR must be routable on your on-premises network.
+ To collect metrics for hybrid nodes using Amazon Managed Service for Prometheus (AMP) managed collectors, your on-premises pod CIDR must be routable on your on-premises network. Or, you can use the AMP managed collector for EKS control plane metrics and resources running in AWS Cloud, and the AWS Distro for OpenTelemetry (ADOT) add-on to collect metrics for hybrid nodes.

## Configure mixed mode clusters
<a name="hybrid-nodes-mixed-mode"></a>

To view the mutating and validating webhooks running on your cluster, you can view the **Extensions** resource type in the **Resources** panel of the EKS console for your cluster, or you can use the following commands. EKS also reports webhook metrics in the cluster observability dashboard, see [Monitor your cluster with the observability dashboard](observability-dashboard.md) for more information.

```
kubectl get mutatingwebhookconfigurations
```

```
kubectl get validatingwebhookconfigurations
```

### Configure Service Traffic Distribution
<a name="hybrid-nodes-mixed-service-traffic-distribution"></a>

When running mixed mode clusters, we recommend that you use [https://kubernetes.io/docs/reference/networking/virtual-ips/#traffic-distribution](https://kubernetes.io/docs/reference/networking/virtual-ips/#traffic-distribution) to keep Service traffic local to the zone it is originating from. Service Traffic Distribution (available for Kubernetes versions 1.31 and later in EKS) is the recommended solution over [Topology Aware Routing](https://kubernetes.io/docs/concepts/services-networking/topology-aware-routing/) because it is more predictable. With Service Traffic Distribution, healthy endpoints in the zone will receive all of the traffic for that zone. With Topology Aware Routing, each service must meet several conditions in that zone to apply the custom routing, otherwise it routes traffic evenly to all endpoints.

If you are using Cilium as your CNI, you must run the CNI with the `enable-service-topology` set to `true` to enable Service Traffic Distribution. You can pass this configuration with the Helm install flag `--set loadBalancer.serviceTopology=true` or you can update an existing installation with the Cilium CLI command `cilium config set enable-service-topology true`. The Cilium agent running on each node must be restarted after updating the configuration for an existing installation.

An example of how to configure Service Traffic Distribution for the CoreDNS Service is shown in the following section, and we recommend that you enable the same for all Services in your cluster to avoid unintended cross-environment traffic.

### Configure CoreDNS replicas
<a name="hybrid-nodes-mixed-coredns"></a>

If you are running a mixed mode cluster with both hybrid nodes and nodes in AWS Cloud, we recommend that you have at least one CoreDNS replica on hybrid nodes and at least one CoreDNS replica on your nodes in AWS Cloud. To prevent latency and network issues in a mixed mode cluster setup, you can configure the CoreDNS Service to prefer the closest CoreDNS replica with [Service Traffic Distribution](https://kubernetes.io/docs/reference/networking/virtual-ips/#traffic-distribution).

 *Service Traffic Distribution* (available for Kubernetes versions 1.31 and later in EKS) is the recommended solution over [Topology Aware Routing](https://kubernetes.io/docs/concepts/services-networking/topology-aware-routing/) because it is more predictable. In Service Traffic Distribution, healthy endpoints in the zone will receive all of the traffic for that zone. In Topology Aware Routing, each service must meet several conditions in that zone to apply the custom routing, otherwise it routes traffic evenly to all endpoints. The following steps configure Service Traffic Distribution.

If you are using Cilium as your CNI, you must run the CNI with the `enable-service-topology` set to `true` to enable Service Traffic Distribution. You can pass this configuration with the Helm install flag `--set loadBalancer.serviceTopology=true` or you can update an existing installation with the Cilium CLI command `cilium config set enable-service-topology true`. The Cilium agent running on each node must be restarted after updating the configuration for an existing installation.

1. Add a topology zone label for each of your hybrid nodes, for example `topology.kubernetes.io/zone: onprem`. Or, you can set the label at the `nodeadm init` phase by specifying the label in your `nodeadm` configuration, see [Node Config for customizing kubelet (Optional)](hybrid-nodes-nodeadm.md#hybrid-nodes-nodeadm-kubelet). Note, nodes running in AWS Cloud automatically get a topology zone label applied to them that corresponds to the availability zone (AZ) of the node.

   ```
   kubectl label node hybrid-node-name topology.kubernetes.io/zone=zone
   ```

1. Add `podAntiAffinity` to the CoreDNS deployment with the topology zone key. Or, you can configure the CoreDNS deployment during installation with EKS add-ons.

   ```
   kubectl edit deployment coredns -n kube-system
   ```

   ```
   spec:
     template:
       spec:
         affinity:
          ...
           podAntiAffinity:
             preferredDuringSchedulingIgnoredDuringExecution:
             - podAffinityTerm:
                 labelSelector:
                   matchExpressions:
                   - key: k8s-app
                     operator: In
                     values:
                     - kube-dns
                 topologyKey: kubernetes.io/hostname
               weight: 100
             - podAffinityTerm:
                 labelSelector:
                   matchExpressions:
                   - key: k8s-app
                     operator: In
                     values:
                     - kube-dns
                 topologyKey: topology.kubernetes.io/zone
               weight: 50
         ...
   ```

1. Add the setting `trafficDistribution: PreferClose` to the `kube-dns` Service configuration to enable Service Traffic Distribution.

   ```
   kubectl patch svc kube-dns -n kube-system --type=merge -p '{
     "spec": {
       "trafficDistribution": "PreferClose"
     }
   }'
   ```

1. You can confirm that Service Traffic Distribution is enabled by viewing the endpoint slices for the `kube-dns` Service. Your endpoint slices must show the `hints` for your topology zone labels, which confirms that Service Traffic Distribution is enabled. If you do not see the `hints` for each endpoint address, then Service Traffic Distribution is not enabled.

   ```
   kubectl get endpointslice -A | grep "kube-dns"
   ```

   ```
   kubectl get endpointslice [.replaceable]`kube-dns-<id>`  -n kube-system -o yaml
   ```

   ```
   addressType: IPv4
   apiVersion: discovery.k8s.io/v1
   endpoints:
   - addresses:
     - <your-hybrid-node-pod-ip>
     hints:
       forZones:
       - name: onprem
     nodeName: <your-hybrid-node-name>
     zone: onprem
   - addresses:
     - <your-cloud-node-pod-ip>
     hints:
       forZones:
       - name: us-west-2a
     nodeName: <your-cloud-node-name>
     zone: us-west-2a
   ```

### Configure webhooks for add-ons
<a name="hybrid-nodes-webhooks-add-ons"></a>

The following add-ons use webhooks and are supported for use with hybrid nodes.
+  AWS Load Balancer Controller
+ CloudWatch Observability Agent
+  AWS Distro for OpenTelemetry (ADOT)
+  `cert-manager` 

See the following sections for configuring the webhooks used by these add-ons to run on nodes in AWS Cloud.

#### AWS Load Balancer Controller
<a name="hybrid-nodes-mixed-lbc"></a>

To use the AWS Load Balancer Controller in a mixed mode cluster setup, you must run the controller on nodes in AWS Cloud. To do so, add the following to your Helm values configuration or specify the values by using EKS add-on configuration.

```
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: eks.amazonaws.com/compute-type
          operator: NotIn
          values:
          - hybrid
```

#### CloudWatch Observability Agent
<a name="hybrid-nodes-mixed-cwagent"></a>

The CloudWatch Observability Agent add-on has a Kubernetes Operator that uses webhooks. To run the operator on nodes in AWS Cloud in a mixed mode cluster setup, edit the CloudWatch Observability Agent operator configuration. You can’t configure the operator affinity during installation with Helm and EKS add-ons (see [containers-roadmap issue \$12431](https://github.com/aws/containers-roadmap/issues/2431)).

```
kubectl edit -n amazon-cloudwatch deployment amazon-cloudwatch-observability-controller-manager
```

```
spec:
  ...
  template:
    ...
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: eks.amazonaws.com/compute-type
                operator: NotIn
                values:
                - hybrid
```

#### AWS Distro for OpenTelemetry (ADOT)
<a name="hybrid-nodes-mixed-adot"></a>

The AWS Distro for OpenTelemetry (ADOT) add-on has a Kubernetes Operator that uses webhooks. To run the operator on nodes in AWS Cloud in a mixed mode cluster setup, add the following to your Helm values configuration or specify the values by using EKS add-on configuration.

```
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: eks.amazonaws.com/compute-type
          operator: NotIn
          values:
          - hybrid
```

If your pod CIDR is not routable on your on-premises network, then the ADOT collector must run on hybrid nodes to scrape the metrics from your hybrid nodes and the workloads running on them. To do so, edit the Custom Resource Definition (CRD).

```
kubectl -n opentelemetry-operator-system edit opentelemetrycollectors.opentelemetry.io adot-col-prom-metrics
```

```
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: eks.amazonaws.com/compute-type
            operator: In
            values:
            - hybrid
```

You can configure the ADOT collector to only scrape metrics from hybrid nodes and the resources running on hybrid nodes by adding the following `relabel_configs` to each `scrape_configs` in the ADOT collector CRD configuration.

```
relabel_configs:
  - action: keep
    regex: hybrid
    source_labels:
    - __meta_kubernetes_node_label_eks_amazonaws_com_compute_type
```

The ADOT add-on has a prerequisite requirement to install `cert-manager` for the TLS certificates used by the ADOT operator webhook. `cert-manager` also runs webhooks and you can configure it to run on nodes in AWS Cloud with the following Helm values configuration.

```
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: eks.amazonaws.com/compute-type
          operator: NotIn
          values:
          - hybrid
webhook:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: eks.amazonaws.com/compute-type
            operator: NotIn
            values:
            - hybrid
cainjector:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: eks.amazonaws.com/compute-type
            operator: NotIn
            values:
            - hybrid
startupapicheck:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: eks.amazonaws.com/compute-type
            operator: NotIn
            values:
            - hybrid
```

#### `cert-manager`
<a name="hybrid-nodes-mixed-cert-manager"></a>

The `cert-manager` add-on runs webhooks and you can configure it to run on nodes in AWS Cloud with the following Helm values configuration.

```
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: eks.amazonaws.com/compute-type
          operator: NotIn
          values:
          - hybrid
webhook:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: eks.amazonaws.com/compute-type
            operator: NotIn
            values:
            - hybrid
cainjector:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: eks.amazonaws.com/compute-type
            operator: NotIn
            values:
            - hybrid
startupapicheck:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: eks.amazonaws.com/compute-type
            operator: NotIn
            values:
            - hybrid
```

# Configure proxy for hybrid nodes
<a name="hybrid-nodes-proxy"></a>

If you are using a proxy server in your on-premises environment for traffic leaving your data center or edge environment, you need to separately configure your nodes and your cluster to use your proxy server.

Cluster  
On your cluster, you need to configure `kube-proxy` to use your proxy server. You must configure `kube-proxy` after creating your Amazon EKS cluster.

Nodes  
On your nodes, you must configure the operating system, `containerd`, `kubelet`, and the Amazon SSM agent to use your proxy server. You can make these changes during the build process for your operating system images or before you run `nodeadm init` on each hybrid node.

## Node-level configuration
<a name="_node_level_configuration"></a>

You must apply the following configurations either in your operating system images or before running `nodeadm init` on each hybrid node.

### `containerd` proxy configuration
<a name="_containerd_proxy_configuration"></a>

 `containerd` is the default container management runtime for Kubernetes. If you are using a proxy for internet access, you must configure `containerd` so it can pull the container images required by Kubernetes and Amazon EKS.

Create a file on each hybrid node called `http-proxy.conf` in the `/etc/systemd/system/containerd.service.d` directory with the following contents. Replace `proxy-domain` and `port` with the values for your environment.

```
[Service]
Environment="HTTP_PROXY=http://proxy-domain:port"
Environment="HTTPS_PROXY=http://proxy-domain:port"
Environment="NO_PROXY=localhost"
```

#### `containerd` configuration from user data
<a name="_containerd_configuration_from_user_data"></a>

The `containerd.service.d` directory will need to be created for this file. You will need to reload systemd to pick up the configuration file without a reboot. In AL2023, the service will likely already be running when your script executes, so you will also need to restart it.

```
mkdir -p /etc/systemd/system/containerd.service.d
echo '[Service]' > /etc/systemd/system/containerd.service.d/http-proxy.conf
echo 'Environment="HTTP_PROXY=http://proxy-domain:port"' >> /etc/systemd/system/containerd.service.d/http-proxy.conf
echo 'Environment="HTTPS_PROXY=http://proxy-domain:port"' >> /etc/systemd/system/containerd.service.d/http-proxy.conf
echo 'Environment="NO_PROXY=localhost"' >> /etc/systemd/system/containerd.service.d/http-proxy.conf
systemctl daemon-reload
systemctl restart containerd
```

### `kubelet` proxy configuration
<a name="_kubelet_proxy_configuration"></a>

 `kubelet` is the Kubernetes node agent that runs on each Kubernetes node and is responsible for managing the node and pods running on it. If you are using a proxy in your on-premises environment, you must configure the `kubelet` so it can communicate with your Amazon EKS cluster’s public or private endpoints.

Create a file on each hybrid node called `http-proxy.conf` in the `/etc/systemd/system/kubelet.service.d/` directory with the following content. Replace `proxy-domain` and `port` with the values for your environment.

```
[Service]
Environment="HTTP_PROXY=http://proxy-domain:port"
Environment="HTTPS_PROXY=http://proxy-domain:port"
Environment="NO_PROXY=localhost"
```

#### `kubelet` configuration from user data
<a name="_kubelet_configuration_from_user_data"></a>

The `kubelet.service.d` directory must be created for this file. You will need to reload systemd to pick up the configuration file without a reboot. In AL2023, the service will likely already be running when your script executes, so you will also need to restart it.

```
mkdir -p /etc/systemd/system/kubelet.service.d
echo '[Service]' > /etc/systemd/system/kubelet.service.d/http-proxy.conf
echo 'Environment="HTTP_PROXY=http://proxy-domain:port"' >> /etc/systemd/system/kubelet.service.d/http-proxy.conf
echo 'Environment="HTTPS_PROXY=http://proxy-domain:port"' >> /etc/systemd/system/kubelet.service.d/http-proxy.conf
echo 'Environment="NO_PROXY=localhost"' >> /etc/systemd/system/kubelet.service.d/http-proxy.conf
systemctl daemon-reload
systemctl restart kubelet
```

### `ssm` proxy configuration
<a name="_ssm_proxy_configuration"></a>

 `ssm` is one of the credential providers that can be used to initialize a hybrid node. `ssm` is responsible for authenticating with AWS and generating temporary credentials that is used by `kubelet`. If you are using a proxy in your on-premises environment and using `ssm` as your credential provider on the node, you must configure the `ssm` so it can communicate with Amazon SSM service endpoints.

Create a file on each hybrid node called `http-proxy.conf` in the path below depending on the operating system
+ Ubuntu - `/etc/systemd/system/snap.amazon-ssm-agent.amazon-ssm-agent.service.d/http-proxy.conf` 
+ Amazon Linux 2023 and Red Hat Enterprise Linux - `/etc/systemd/system/amazon-ssm-agent.service.d/http-proxy.conf` 

Populate the file with the following contents. Replace `proxy-domain` and `port` with the values for your environment.

```
[Service]
Environment="HTTP_PROXY=http://proxy-domain:port"
Environment="HTTPS_PROXY=http://proxy-domain:port"
Environment="NO_PROXY=localhost"
```

#### `ssm` configuration from user data
<a name="_ssm_configuration_from_user_data"></a>

The `ssm` systemd service file directory must be created for this file. The directory path depends on the operating system used on the node.
+ Ubuntu - `/etc/systemd/system/snap.amazon-ssm-agent.amazon-ssm-agent.service.d` 
+ Amazon Linux 2023 and Red Hat Enterprise Linux - `/etc/systemd/system/amazon-ssm-agent.service.d` 

Replace the systemd service name in the restart command below depending on the operating system used on the node
+ Ubuntu - `snap.amazon-ssm-agent.amazon-ssm-agent` 
+ Amazon Linux 2023 and Red Hat Enterprise Linux - `amazon-ssm-agent` 

```
mkdir -p systemd-service-file-directory
echo '[Service]' > [.replaceable]#systemd-service-file-directory/http-proxy.conf
echo 'Environment="HTTP_PROXY=http://[.replaceable]#proxy-domain:port"' >> systemd-service-file-directory/http-proxy.conf
echo 'Environment="HTTPS_PROXY=http://[.replaceable]#proxy-domain:port"' >> [.replaceable]#systemd-service-file-directory/http-proxy.conf
echo 'Environment="NO_PROXY=localhost"' >> [.replaceable]#systemd-service-file-directory/http-proxy.conf
systemctl daemon-reload
systemctl restart [.replaceable]#systemd-service-name
```

### Operating system proxy configuration
<a name="_operating_system_proxy_configuration"></a>

If you are using a proxy for internet access, you must configure your operating system to be able to pull the hybrid nodes dependencies from your operating systems' package manager.

 **Ubuntu** 

1. Configure `snap` to use your proxy with the following commands:

   ```
   sudo snap set system proxy.https=http://proxy-domain:port
   sudo snap set system proxy.http=http://proxy-domain:port
   ```

1. To enable proxy for `apt`, create a file called `apt.conf` in the `/etc/apt/` directory. Replace proxy-domain and port with the values for your environment.

   ```
   Acquire::http::Proxy "http://proxy-domain:port";
   Acquire::https::Proxy "http://proxy-domain:port";
   ```

 **Amazon Linux 2023** 

1. Configure `dnf` to use your proxy. Create a file `/etc/dnf/dnf.conf` with the proxy-domain and port values for your environment.

   ```
   proxy=http://proxy-domain:port
   ```

 **Red Hat Enterprise Linux** 

1. Configure `yum` to use your proxy. Create a file `/etc/yum.conf` with the proxy-domain and port values for your environment.

   ```
   proxy=http://proxy-domain:port
   ```

### IAM Roles Anywhere proxy configuration
<a name="_iam_roles_anywhere_proxy_configuration"></a>

The IAM Roles Anywhere credential provider service is responsible for refreshing credentials when using IAM Roles Anywhere with the `enableCredentialsFile` flag (see [EKS Pod Identity Agent](hybrid-nodes-add-ons.md#hybrid-nodes-add-ons-pod-id)). If you are using a proxy in your on-premises environment, you must configure the service so it can communicate with IAM Roles Anywhere endpoints.

Create a file called `http-proxy.conf` in the `/etc/systemd/system/aws_signing_helper_update.service.d/` directory with the following content. Replace `proxy-domain` and `port` with the values for your environment.

```
[Service]
Environment="HTTP_PROXY=http://proxy-domain:port"
Environment="HTTPS_PROXY=http://proxy-domain:port"
Environment="NO_PROXY=localhost"
```

## Cluster wide configuration
<a name="_cluster_wide_configuration"></a>

The configurations in this section must be applied after you create your Amazon EKS cluster and before running `nodeadm init` on each hybrid node.

### kube-proxy proxy configuration
<a name="_kube_proxy_proxy_configuration"></a>

Amazon EKS automatically installs `kube-proxy` on each hybrid node as a DaemonSet when your hybrid nodes join the cluster. `kube-proxy` enables routing across services that are backed by pods on Amazon EKS clusters. To configure each host, `kube-proxy` requires DNS resolution for your Amazon EKS cluster endpoint.

1. Edit the `kube-proxy` DaemonSet with the following command

   ```
   kubectl -n kube-system edit ds kube-proxy
   ```

   This will open the `kube-proxy` DaemonSet definition on your configured editor.

1. Add the environment variables for `HTTP_PROXY` and `HTTPS_PROXY`. Note the `NODE_NAME` environment variable should already exist in your configuration. Replace `proxy-domain` and `port` with values for your environment.

   ```
   containers:
     - command:
       - kube-proxy
       - --v=2
       - --config=/var/lib/kube-proxy-config/config - --hostname-override=$(NODE_NAME)
       env:
       - name: HTTP_PROXY
         value: http://proxy-domain:port
       - name: HTTPS_PROXY
         value: http://proxy-domain:port
       - name: NODE_NAME
         valueFrom:
           fieldRef:
             apiVersion: v1
             fieldPath: spec.nodeName
   ```

# Configure Cilium BGP for hybrid nodes
<a name="hybrid-nodes-cilium-bgp"></a>

This topic describes how to configure Cilium Border Gateway Protocol (BGP) for Amazon EKS Hybrid Nodes. Cilium’s BGP functionality is called [Cilium BGP Control Plane](https://docs.cilium.io/en/stable/network/bgp-control-plane/bgp-control-plane/) and can be used to advertise pod and service addresses to your on-premises network. For alternative methods to make pod CIDRs routable on your on-premises network, see [Routable remote Pod CIDRs](hybrid-nodes-concepts-kubernetes.md#hybrid-nodes-concepts-k8s-pod-cidrs).

## Configure Cilium BGP
<a name="hybrid-nodes-cilium-bgp-configure"></a>

### Prerequisites
<a name="_prerequisites"></a>
+ Cilium installed following the instructions in [Configure CNI for hybrid nodes](hybrid-nodes-cni.md).

### Procedure
<a name="_procedure"></a>

1. To use BGP with Cilium to advertise pod or service addresses with your on-premises network, Cilium must be installed with `bgpControlPlane.enabled: true`. If you are enabling BGP for an existing Cilium deployment, you must restart the Cilium operator to apply the BGP configuration if BGP was not previously enabled. You can set `operator.rollOutPods` to `true` in your Helm values to restart the Cilium operator as part of the Helm install/upgrade process.

   ```
   helm upgrade cilium oci://public.ecr.aws/eks/cilium/cilium \
     --namespace kube-system \
     --reuse-values \
     --set operator.rollOutPods=true \
     --set bgpControlPlane.enabled=true
   ```

1. Confirm that the Cilium operator and agents were restarted and are running.

   ```
   kubectl -n kube-system get pods --selector=app.kubernetes.io/part-of=cilium
   ```

   ```
   NAME                               READY   STATUS    RESTARTS   AGE
   cilium-grwlc                       1/1     Running   0          4m12s
   cilium-operator-68f7766967-5nnbl   1/1     Running   0          4m20s
   cilium-operator-68f7766967-7spfz   1/1     Running   0          4m20s
   cilium-pnxcv                       1/1     Running   0          6m29s
   cilium-r7qkj                       1/1     Running   0          4m12s
   cilium-wxhfn                       1/1     Running   0          4m1s
   cilium-z7hlb                       1/1     Running   0          6m30s
   ```

1. Create a file called `cilium-bgp-cluster.yaml` with a `CiliumBGPClusterConfig` definition. You may need to obtain the following information from your network administrator.
   + Configure `localASN` with the ASN for the nodes running Cilium.
   + Configure `peerASN` with the ASN for your on-premises router.
   + Configure the `peerAddress` with the on-premises router IP that each node running Cilium will peer with.

     ```
     apiVersion: cilium.io/v2alpha1
     kind: CiliumBGPClusterConfig
     metadata:
       name: cilium-bgp
     spec:
       nodeSelector:
         matchExpressions:
         - key: eks.amazonaws.com/compute-type
           operator: In
           values:
           - hybrid
       bgpInstances:
       - name: "rack0"
         localASN: NODES_ASN
         peers:
         - name: "onprem-router"
           peerASN: ONPREM_ROUTER_ASN
           peerAddress: ONPREM_ROUTER_IP
           peerConfigRef:
             name: "cilium-peer"
     ```

1. Apply the Cilium BGP cluster configuration to your cluster.

   ```
   kubectl apply -f cilium-bgp-cluster.yaml
   ```

1. Create a file named `cilium-bgp-peer.yaml` with the `CiliumBGPPeerConfig` resource that defines a BGP peer configuration. Multiple peers can share the same configuration and provide reference to the common `CiliumBGPPeerConfig` resource. See the [BGP Peer configuration](https://docs.cilium.io/en/latest/network/bgp-control-plane/bgp-control-plane-v2/#bgp-peer-configuration) in the Cilium documentation for a full list of configuration options.

   The values for the following Cilium peer settings must match those of the on-premises router you are peering with.
   + Configure `holdTimeSeconds` which determines how long a BGP peer waits for a keepalive or update message before declaring the session down. The default is 90 seconds.
   + Configure `keepAliveTimeSeconds` which determines if a BGP peer is still reachable and the BGP session is active. The default is 30 seconds.
   + Configure `restartTimeSeconds` which determines the time that Cilium’s BGP control plane is expected to re-establish the BGP session after a restart. The default is 120 seconds.

     ```
     apiVersion: cilium.io/v2alpha1
     kind: CiliumBGPPeerConfig
     metadata:
       name: cilium-peer
     spec:
       timers:
         holdTimeSeconds: 90
         keepAliveTimeSeconds: 30
       gracefulRestart:
         enabled: true
         restartTimeSeconds: 120
       families:
         - afi: ipv4
           safi: unicast
           advertisements:
             matchLabels:
               advertise: "bgp"
     ```

1. Apply the Cilium BGP peer configuration to your cluster.

   ```
   kubectl apply -f cilium-bgp-peer.yaml
   ```

1. Create a file named `cilium-bgp-advertisement-pods.yaml` with a `CiliumBGPAdvertisement` resource to advertise the pod CIDRs to your on-premises network.
   + The `CiliumBGPAdvertisement` resource is used to define advertisement types and attributes associated with them. The example below configures Cilium to advertise only pod CIDRs. See the examples in [Service type LoadBalancer](hybrid-nodes-ingress.md#hybrid-nodes-ingress-cilium-loadbalancer) and [Cilium in-cluster load balancing](hybrid-nodes-load-balancing.md#hybrid-nodes-service-lb-cilium) for more information on configuring Cilium to advertise service addresses.
   + Each hybrid node running the Cilium agent peers with the upstream BGP-enabled router. Each node advertises the pod CIDR range that it owns when Cilium’s `advertisementType` is set to `PodCIDR` like in the example below. See the [BGP Advertisements configuration](https://docs.cilium.io/en/stable/network/bgp-control-plane/bgp-control-plane-v2/#bgp-advertisements) in the Cilium documentation for more information.

     ```
     apiVersion: cilium.io/v2alpha1
     kind: CiliumBGPAdvertisement
     metadata:
       name: bgp-advertisement-pods
       labels:
         advertise: bgp
     spec:
       advertisements:
         - advertisementType: "PodCIDR"
     ```

1. Apply the Cilium BGP Advertisement configuration to your cluster.

   ```
   kubectl apply -f cilium-bgp-advertisement-pods.yaml
   ```

1. You can confirm the BGP peering worked with the [Cilium CLI](https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/#install-the-cilium-cli) by using the `cilium bgp peers` command. You should see the correct values in the output for your environment and the Session State as `established`. See the [Troubleshooting and Operations Guide](https://docs.cilium.io/en/latest/network/bgp-control-plane/bgp-control-plane/#troubleshooting-and-operation-guide) in the Cilium documentation for more information on troubleshooting.

   In the examples below, there are five hybrid nodes running the Cilium agent and each node is advertising the Pod CIDR range that it owns.

   ```
   cilium bgp peers
   ```

   ```
   Node                   Local AS    Peer AS               Peer Address        Session State   Uptime     Family         Received   Advertised
   mi-026d6a261e355fba7   NODES_ASN
                     ONPREM_ROUTER_ASN
                     ONPREM_ROUTER_IP    established     1h18m58s   ipv4/unicast   1          2
   mi-082f73826a163626e   NODES_ASN
                     ONPREM_ROUTER_ASN
                     ONPREM_ROUTER_IP    established     1h19m12s   ipv4/unicast   1          2
   mi-09183e8a3d755abf6   NODES_ASN
                     ONPREM_ROUTER_ASN
                     ONPREM_ROUTER_IP    established     1h18m47s   ipv4/unicast   1          2
   mi-0d78d815980ed202d   NODES_ASN
                     ONPREM_ROUTER_ASN
                     ONPREM_ROUTER_IP    established     1h19m12s   ipv4/unicast   1          2
   mi-0daa253999fe92daa   NODES_ASN
                     ONPREM_ROUTER_ASN
                     ONPREM_ROUTER_IP    established     1h18m58s   ipv4/unicast   1          2
   ```

   ```
   cilium bgp routes
   ```

   ```
   Node                   VRouter       Prefix           NextHop   Age         Attrs
   mi-026d6a261e355fba7   NODES_ASN     10.86.2.0/26     0.0.0.0   1h16m46s   [{Origin: i} {Nexthop: 0.0.0.0}]
   mi-082f73826a163626e   NODES_ASN     10.86.2.192/26   0.0.0.0   1h16m46s   [{Origin: i} {Nexthop: 0.0.0.0}]
   mi-09183e8a3d755abf6   NODES_ASN     10.86.2.64/26    0.0.0.0   1h16m46s   [{Origin: i} {Nexthop: 0.0.0.0}]
   mi-0d78d815980ed202d   NODES_ASN     10.86.2.128/26   0.0.0.0   1h16m46s   [{Origin: i} {Nexthop: 0.0.0.0}]
   mi-0daa253999fe92daa   NODES_ASN     10.86.3.0/26     0.0.0.0   1h16m46s   [{Origin: i} {Nexthop: 0.0.0.0}]
   ```

# Configure Kubernetes Ingress for hybrid nodes
<a name="hybrid-nodes-ingress"></a>

This topic describes how to configure Kubernetes Ingress for workloads running on Amazon EKS Hybrid Nodes. [Kubernetes Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. To make use of Ingress resources, a Kubernetes Ingress controller is required to set up the networking infrastructure and components that serve the network traffic.

 AWS supports AWS Application Load Balancer (ALB) and Cilium for Kubernetes Ingress for workloads running on EKS Hybrid Nodes. The decision to use ALB or Cilium for Ingress is based on the source of application traffic. If application traffic originates from an AWS Region, AWS recommends using AWS ALB and the AWS Load Balancer Controller. If application traffic originates from the local on-premises or edge environment, AWS recommends using Cilium’s built-in Ingress capabilities, which can be used with or without load balancer infrastructure in your environment.

![\[EKS Hybrid Nodes Ingress\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-nodes-ingress.png)


## AWS Application Load Balancer
<a name="hybrid-nodes-ingress-alb"></a>

You can use the [AWS Load Balancer Controller](aws-load-balancer-controller.md) and Application Load Balancer (ALB) with the target type `ip` for workloads running on hybrid nodes. When using target type `ip`, ALB forwards traffic directly to the pods, bypassing the Service layer network path. For ALB to reach the pod IP targets on hybrid nodes, your on-premises pod CIDR must be routable on your on-premises network. Additionally, the AWS Load Balancer Controller uses webhooks and requires direct communication from the EKS control plane. For more information, see [Configure webhooks for hybrid nodes](hybrid-nodes-webhooks.md).

### Considerations
<a name="_considerations"></a>
+ See [Route application and HTTP traffic with Application Load Balancers](alb-ingress.md) and [Install AWS Load Balancer Controller with Helm](lbc-helm.md) for more information on AWS Application Load Balancer and AWS Load Balancer Controller.
+ See [Best Practices for Load Balancing](https://docs.aws.amazon.com/eks/latest/best-practices/load-balancing.html) for information on how to choose between AWS Application Load Balancer and AWS Network Load Balancer.
+ See [AWS Load Balancer Controller Ingress annotations](https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/ingress/annotations/) for the list of annotations that can be configured for Ingress resources with AWS Application Load Balancer.

### Prerequisites
<a name="_prerequisites"></a>
+ Cilium installed following the instructions in [Configure CNI for hybrid nodes](hybrid-nodes-cni.md).
+ Cilium BGP Control Plane enabled following the instructions in [Configure Cilium BGP for hybrid nodes](hybrid-nodes-cilium-bgp.md). If you do not want to use BGP, you must use an alternative method to make your on-premises pod CIDRs routable on your on-premises network. If you do not make your on-premises pod CIDRs routable, ALB will not be able to register or contact your pod IP targets.
+ Helm installed in your command-line environment, see the [Setup Helm instructions](helm.md) for more information.
+ eksctl installed in your command-line environment, see the [eksctl install instructions](install-kubectl.md#eksctl-install-update) for more information.

### Procedure
<a name="_procedure"></a>

1. Download an IAM policy for the AWS Load Balancer Controller that allows it to make calls to AWS APIs on your behalf.

   ```
   curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/refs/heads/main/docs/install/iam_policy.json
   ```

1. Create an IAM policy using the policy downloaded in the previous step.

   ```
   aws iam create-policy \
       --policy-name AWSLoadBalancerControllerIAMPolicy \
       --policy-document file://iam_policy.json
   ```

1. Replace the value for cluster name (`CLUSTER_NAME`), AWS Region (`AWS_REGION`), and AWS account ID (`AWS_ACCOUNT_ID`) with your settings and run the following command.

   ```
   eksctl create iamserviceaccount \
       --cluster=CLUSTER_NAME \
       --namespace=kube-system \
       --name=aws-load-balancer-controller \
       --attach-policy-arn=arn:aws:iam::AWS_ACCOUNT_ID:policy/AWSLoadBalancerControllerIAMPolicy \
       --override-existing-serviceaccounts \
       --region AWS_REGION \
       --approve
   ```

1. Add the eks-charts Helm chart repository and update your local Helm repository to make sure that you have the most recent charts.

   ```
   helm repo add eks https://aws.github.io/eks-charts
   ```

   ```
   helm repo update eks
   ```

1. Install the AWS Load Balancer Controller. Replace the value for cluster name (`CLUSTER_NAME`), AWS Region (`AWS_REGION`), VPC ID (`VPC_ID`), and AWS Load Balancer Controller Helm chart version (`AWS_LBC_HELM_VERSION`) with your settings and run the following command. If you are running a mixed mode cluster with both hybrid nodes and nodes in AWS Cloud, you can run the AWS Load Balancer Controller on cloud nodes following the instructions at [AWS Load Balancer Controller](hybrid-nodes-webhooks.md#hybrid-nodes-mixed-lbc).
   + You can find the latest version of the Helm chart by running `helm search repo eks/aws-load-balancer-controller --versions`.

     ```
     helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
       -n kube-system \
       --version AWS_LBC_HELM_VERSION \
       --set clusterName=CLUSTER_NAME \
       --set region=AWS_REGION \
       --set vpcId=VPC_ID \
       --set serviceAccount.create=false \
       --set serviceAccount.name=aws-load-balancer-controller
     ```

1. Verify the AWS Load Balancer Controller was installed successfully.

   ```
   kubectl get -n kube-system deployment aws-load-balancer-controller
   ```

   ```
   NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
   aws-load-balancer-controller   2/2     2            2           84s
   ```

1. Create a sample application. The example below uses the [Istio Bookinfo](https://istio.io/latest/docs/examples/bookinfo/) sample microservices application.

   ```
   kubectl apply -f https://raw.githubusercontent.com/istio/istio/refs/heads/master/samples/bookinfo/platform/kube/bookinfo.yaml
   ```

1. Create a file named `my-ingress-alb.yaml` with the following contents.

   ```
   apiVersion: networking.k8s.io/v1
   kind: Ingress
   metadata:
     name: my-ingress
     namespace: default
     annotations:
       alb.ingress.kubernetes.io/load-balancer-name: "my-ingress-alb"
       alb.ingress.kubernetes.io/target-type: "ip"
       alb.ingress.kubernetes.io/scheme: "internet-facing"
       alb.ingress.kubernetes.io/healthcheck-path: "/details/1"
   spec:
     ingressClassName: alb
     rules:
     - http:
         paths:
         - backend:
             service:
               name: details
               port:
                 number: 9080
           path: /details
           pathType: Prefix
   ```

1. Apply the Ingress configuration to your cluster.

   ```
   kubectl apply -f my-ingress-alb.yaml
   ```

1. Provisioning the ALB for your Ingress resource may take a few minutes. Once the ALB is provisioned, your Ingress resource will have an address assigned to it that corresponds to the DNS name of the ALB deployment. The address will have the format `<alb-name>-<random-string>.<region>.elb.amazonaws.com`.

   ```
   kubectl get ingress my-ingress
   ```

   ```
   NAME         CLASS   HOSTS   ADDRESS                                                     PORTS   AGE
   my-ingress   alb     *       my-ingress-alb-<random-string>.<region>.elb.amazonaws.com   80      23m
   ```

1. Access the Service using the address of the ALB.

   ```
   curl -s http//my-ingress-alb-<random-string>.<region>.elb.amazonaws.com:80/details/1 | jq
   ```

   ```
   {
     "id": 1,
     "author": "William Shakespeare",
     "year": 1595,
     "type": "paperback",
     "pages": 200,
     "publisher": "PublisherA",
     "language": "English",
     "ISBN-10": "1234567890",
     "ISBN-13": "123-1234567890"
     "details": "This is the details page"
   }
   ```

## Cilium Ingress and Cilium Gateway Overview
<a name="hybrid-nodes-ingress-cilium"></a>

Cilium’s Ingress capabilities are built into Cilium’s architecture and can be managed with the Kubernetes Ingress API or Gateway API. If you don’t have existing Ingress resources, AWS recommends to start with the Gateway API, as it is a more expressive and flexible way to define and manage Kubernetes networking resources. The [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io/) aims to standardize how networking resources for Ingress, Load Balancing, and Service Mesh are defined and managed in Kubernetes clusters.

When you enable Cilium’s Ingress or Gateway features, the Cilium operator reconciles Ingress / Gateway objects in the cluster and Envoy proxies on each node process the Layer 7 (L7) network traffic. Cilium does not directly provision Ingress / Gateway infrastructure such as load balancers. If you plan to use Cilium Ingress / Gateway with a load balancer, you must use the load balancer’s tooling, commonly an Ingress or Gateway controller, to deploy and manage the load balancer’s infrastructure.

For Ingress / Gateway traffic, Cilium handles the core network traffic and L3/L4 policy enforcement, and integrated Envoy proxies process the L7 network traffic. With Cilium Ingress / Gateway, Envoy is responsible for applying L7 routing rules, policies, and request manipulation, advanced traffic management such as traffic splitting and mirroring, and TLS termination and origination. Cilium’s Envoy proxies are deployed as a separate DaemonSet (`cilium-envoy`) by default, which enables Envoy and the Cilium agent to be separately updated, scaled, and managed.

For more information on how Cilium Ingress and Cilium Gateway work, see the [Cilium Ingress](https://docs.cilium.io/en/stable/network/servicemesh/ingress/) and [Cilium Gateway](https://docs.cilium.io/en/stable/network/servicemesh/gateway-api/gateway-api/) pages in the Cilium documentation.

## Cilium Ingress and Gateway Comparison
<a name="hybrid-nodes-ingress-cilium-comparison"></a>

The table below summarizes the Cilium Ingress and Cilium Gateway features as of **Cilium version 1.17.x**.


| Feature | Ingress | Gateway | 
| --- | --- | --- | 
|  Service type LoadBalancer  |  Yes  |  Yes  | 
|  Service type NodePort  |  Yes  |  No1   | 
|  Host network  |  Yes  |  Yes  | 
|  Shared load balancer  |  Yes  |  Yes  | 
|  Dedicated load balancer  |  Yes  |  No2   | 
|  Network policies  |  Yes  |  Yes  | 
|  Protocols  |  Layer 7 (HTTP(S), gRPC)  |  Layer 7 (HTTP(S), gRPC)3   | 
|  TLS Passthrough  |  Yes  |  Yes  | 
|  Traffic Management  |  Path and Host routing  |  Path and Host routing, URL redirect and rewrite, traffic splitting, header modification  | 

 1 Cilium Gateway support for NodePort services is planned for Cilium version 1.18.x ([\$127273](https://github.com/cilium/cilium/pull/27273))

 2 Cilium Gateway support for dedicated load balancers ([\$125567](https://github.com/cilium/cilium/issues/25567))

 3 Cilium Gateway support for TCP/UDP ([\$121929](https://github.com/cilium/cilium/issues/21929))

## Install Cilium Gateway
<a name="hybrid-nodes-ingress-cilium-gateway-install"></a>

### Considerations
<a name="_considerations_2"></a>
+ Cilium must be configured with `nodePort.enabled` set to `true` as shown in the examples below. If you are using Cilium’s kube-proxy replacement feature, you do not need to set `nodePort.enabled` to `true`.
+ Cilium must be configured with `envoy.enabled` set to `true` as shown in the examples below.
+ Cilium Gateway can be deployed in load balancer (default) or host network mode.
+ When using Cilium Gateway in load balancer mode, the `service.beta.kubernetes.io/aws-load-balancer-type: "external"` annotation must be set on the Gateway resource to prevent the legacy AWS cloud provider from creating a Classic Load Balancer for the Service of type LoadBalancer that Cilium creates for the Gateway resource.
+ When using Cilium Gateway in host network mode, the Service of type LoadBalancer mode is disabled. Host network mode is useful for environments that do not have load balancer infrastructure, see [Host network](#hybrid-nodes-ingress-cilium-host-network) for more information.

### Prerequisites
<a name="_prerequisites_2"></a>

1. Helm installed in your command-line environment, see [Setup Helm instructions](helm.md).

1. Cilium installed following the instructions in [Configure CNI for hybrid nodes](hybrid-nodes-cni.md).

### Procedure
<a name="_procedure_2"></a>

1. Install the Kubernetes Gateway API Custom Resource Definitions (CRDs).

   ```
   kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.1/config/crd/standard/gateway.networking.k8s.io_gatewayclasses.yaml
   kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.1/config/crd/standard/gateway.networking.k8s.io_gateways.yaml
   kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.1/config/crd/standard/gateway.networking.k8s.io_httproutes.yaml
   kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.1/config/crd/standard/gateway.networking.k8s.io_referencegrants.yaml
   kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api/v1.2.1/config/crd/standard/gateway.networking.k8s.io_grpcroutes.yaml
   ```

1. Create a file called `cilium-gateway-values.yaml` with the following contents. The example below configures Cilium Gateway to use the default load balancer mode and to use a separate `cilium-envoy` DaemonSet for Envoy proxies configured to run only on hybrid nodes.

   ```
   gatewayAPI:
     enabled: true
     # uncomment to use host network mode
     # hostNetwork:
     #   enabled: true
   nodePort:
     enabled: true
   envoy:
     enabled: true
     affinity:
       nodeAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
           nodeSelectorTerms:
           - matchExpressions:
             - key: eks.amazonaws.com/compute-type
               operator: In
               values:
               - hybrid
   ```

1. Apply the Helm values file to your cluster.

   ```
   helm upgrade cilium oci://public.ecr.aws/eks/cilium/cilium \
     --namespace kube-system \
     --reuse-values \
     --set operator.rollOutPods=true \
     --values cilium-gateway-values.yaml
   ```

1. Confirm the Cilium operator, agent, and Envoy pods are running.

   ```
   kubectl -n kube-system get pods --selector=app.kubernetes.io/part-of=cilium
   ```

   ```
   NAME                               READY   STATUS    RESTARTS   AGE
   cilium-envoy-5pgnd                 1/1     Running   0          6m31s
   cilium-envoy-6fhg4                 1/1     Running   0          6m30s
   cilium-envoy-jskrk                 1/1     Running   0          6m30s
   cilium-envoy-k2xtb                 1/1     Running   0          6m31s
   cilium-envoy-w5s9j                 1/1     Running   0          6m31s
   cilium-grwlc                       1/1     Running   0          4m12s
   cilium-operator-68f7766967-5nnbl   1/1     Running   0          4m20s
   cilium-operator-68f7766967-7spfz   1/1     Running   0          4m20s
   cilium-pnxcv                       1/1     Running   0          6m29s
   cilium-r7qkj                       1/1     Running   0          4m12s
   cilium-wxhfn                       1/1     Running   0          4m1s
   cilium-z7hlb                       1/1     Running   0          6m30s
   ```

## Configure Cilium Gateway
<a name="hybrid-nodes-ingress-cilium-gateway-configure"></a>

Cilium Gateway is enabled on Gateway objects by setting the `gatewayClassName` to `cilium`. The Service that Cilium creates for Gateway resources can be configured with fields on the Gateway object. Common annotations used by Gateway controllers to configure the load balancer infrastructure can be configured with the Gateway object’s `infrastructure` field. When using Cilium’s LoadBalancer IPAM (see example in [Service type LoadBalancer](#hybrid-nodes-ingress-cilium-loadbalancer)), the IP address to use for the Service of type LoadBalancer can be configured on the Gateway object’s `addresses` field. For more information on Gateway configuration, see the [Kubernetes Gateway API specification](https://gateway-api.sigs.k8s.io/reference/spec/#gateway).

```
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: my-gateway
spec:
  gatewayClassName: cilium
  infrastructure:
    annotations:
      service.beta.kubernetes.io/...
      service.kubernetes.io/...
  addresses:
  - type: IPAddress
    value: <LoadBalancer IP address>
  listeners:
  ...
```

Cilium and the Kubernetes Gateway specification support the GatewayClass, Gateway, HTTPRoute, GRPCRoute, and ReferenceGrant resources.
+ See [HTTPRoute](https://gateway-api.sigs.k8s.io/api-types/httproute/HTTPRoute) and [GRPCRoute](https://gateway-api.sigs.k8s.io/api-types/grpcroute/GRPCRoute) specifications for the list of available fields.
+ See the examples in the [Deploy Cilium Gateway](#hybrid-nodes-ingress-cilium-gateway-deploy) section below and the examples in the [Cilium documentation](https://docs.cilium.io/en/stable/network/servicemesh/gateway-api/gateway-api/#examples) for how to use and configure these resources.

## Deploy Cilium Gateway
<a name="hybrid-nodes-ingress-cilium-gateway-deploy"></a>

1. Create a sample application. The example below uses the [Istio Bookinfo](https://istio.io/latest/docs/examples/bookinfo/) sample microservices application.

   ```
   kubectl apply -f https://raw.githubusercontent.com/istio/istio/refs/heads/master/samples/bookinfo/platform/kube/bookinfo.yaml
   ```

1. Confirm the application is running successfully.

   ```
   kubectl get pods
   ```

   ```
   NAME                              READY   STATUS    RESTARTS   AGE
   details-v1-766844796b-9965p       1/1     Running   0          81s
   productpage-v1-54bb874995-jmc8j   1/1     Running   0          80s
   ratings-v1-5dc79b6bcd-smzxz       1/1     Running   0          80s
   reviews-v1-598b896c9d-vj7gb       1/1     Running   0          80s
   reviews-v2-556d6457d-xbt8v        1/1     Running   0          80s
   reviews-v3-564544b4d6-cpmvq       1/1     Running   0          80s
   ```

1. Create a file named `my-gateway.yaml` with the following contents. The example below uses the `service.beta.kubernetes.io/aws-load-balancer-type: "external"` annotation to prevent the legacy AWS cloud provider from creating a Classic Load Balancer for the Service of type LoadBalancer that Cilium creates for the Gateway resource.

   ```
   ---
   apiVersion: gateway.networking.k8s.io/v1
   kind: Gateway
   metadata:
     name: my-gateway
   spec:
     gatewayClassName: cilium
     infrastructure:
       annotations:
         service.beta.kubernetes.io/aws-load-balancer-type: "external"
     listeners:
     - protocol: HTTP
       port: 80
       name: web-gw
       allowedRoutes:
         namespaces:
           from: Same
   ---
   apiVersion: gateway.networking.k8s.io/v1
   kind: HTTPRoute
   metadata:
     name: http-app-1
   spec:
     parentRefs:
     - name: my-gateway
       namespace: default
     rules:
     - matches:
       - path:
           type: PathPrefix
           value: /details
       backendRefs:
       - name: details
         port: 9080
   ```

1. Apply the Gateway resource to your cluster.

   ```
   kubectl apply -f my-gateway.yaml
   ```

1. Confirm the Gateway resource and corresponding Service were created. At this stage, it is expected that the `ADDRESS` field of the Gateway resource is not populated with an IP address or hostname, and that the Service of type LoadBalancer for the Gateway resource similarly does not have an IP address or hostname assigned.

   ```
   kubectl get gateway my-gateway
   ```

   ```
   NAME         CLASS    ADDRESS   PROGRAMMED   AGE
   my-gateway   cilium             True         10s
   ```

   ```
   kubectl get svc cilium-gateway-my-gateway
   ```

   ```
   NAME                        TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
   cilium-gateway-my-gateway   LoadBalancer   172.16.227.247   <pending>     80:30912/TCP   24s
   ```

1. Proceed to [Service type LoadBalancer](#hybrid-nodes-ingress-cilium-loadbalancer) to configure the Gateway resource to use an IP address allocated by Cilium Load Balancer IPAM, and [Service type NodePort](#hybrid-nodes-ingress-cilium-nodeport) or [Host network](#hybrid-nodes-ingress-cilium-host-network) to configure the Gateway resource to use NodePort or host network addresses.

## Install Cilium Ingress
<a name="hybrid-nodes-ingress-cilium-ingress-install"></a>

### Considerations
<a name="_considerations_3"></a>
+ Cilium must be configured with `nodePort.enabled` set to `true` as shown in the examples below. If you are using Cilium’s kube-proxy replacement feature, you do not need to set `nodePort.enabled` to `true`.
+ Cilium must be configured with `envoy.enabled` set to `true` as shown in the examples below.
+ With `ingressController.loadbalancerMode` set to `dedicated`, Cilium creates dedicated Services for each Ingress resource. With `ingressController.loadbalancerMode` set to `shared`, Cilium creates a shared Service of type LoadBalancer for all Ingress resources in the cluster. When using the `shared` load balancer mode, the settings for the shared Service such as `labels`, `annotations`, `type`, and `loadBalancerIP` are configured in the `ingressController.service` section of the Helm values. See the [Cilium Helm values reference](https://github.com/cilium/cilium/blob/v1.17.6/install/kubernetes/cilium/values.yaml#L887) for more information.
+ With `ingressController.default` set to `true`, Cilium is configured as the default Ingress controller for the cluster and will create Ingress entries even when the `ingressClassName` is not specified on Ingress resources.
+ Cilium Ingress can be deployed in load balancer (default), node port, or host network mode. When Cilium is installed in host network mode, the Service of type LoadBalancer and Service of type NodePort modes are disabled. See [Host network](#hybrid-nodes-ingress-cilium-host-network) for more information.
+ Always set `ingressController.service.annotations` to `service.beta.kubernetes.io/aws-load-balancer-type: "external"` in the Helm values to prevent the legacy AWS cloud provider from creating a Classic Load Balancer for the default `cilium-ingress` Service created by the [Cilium Helm chart](https://github.com/cilium/cilium/blob/main/install/kubernetes/cilium/templates/cilium-ingress-service.yaml).

### Prerequisites
<a name="_prerequisites_3"></a>

1. Helm installed in your command-line environment, see [Setup Helm instructions](helm.md).

1. Cilium installed following the instructions in [Configure CNI for hybrid nodes](hybrid-nodes-cni.md).

### Procedure
<a name="_procedure_3"></a>

1. Create a file called `cilium-ingress-values.yaml` with the following contents. The example below configures Cilium Ingress to use the default load balancer `dedicated` mode and to use a separate `cilium-envoy` DaemonSet for Envoy proxies configured to run only on hybrid nodes.

   ```
   ingressController:
     enabled: true
     loadbalancerMode: dedicated
     service:
       annotations:
         service.beta.kubernetes.io/aws-load-balancer-type: "external"
   nodePort:
     enabled: true
   envoy:
     enabled: true
     affinity:
       nodeAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
           nodeSelectorTerms:
           - matchExpressions:
             - key: eks.amazonaws.com/compute-type
               operator: In
               values:
               - hybrid
   ```

1. Apply the Helm values file to your cluster.

   ```
   helm upgrade cilium oci://public.ecr.aws/eks/cilium/cilium \
     --namespace kube-system \
     --reuse-values \
     --set operator.rollOutPods=true \
     --values cilium-ingress-values.yaml
   ```

1. Confirm the Cilium operator, agent, and Envoy pods are running.

   ```
   kubectl -n kube-system get pods --selector=app.kubernetes.io/part-of=cilium
   ```

   ```
   NAME                               READY   STATUS    RESTARTS   AGE
   cilium-envoy-5pgnd                 1/1     Running   0          6m31s
   cilium-envoy-6fhg4                 1/1     Running   0          6m30s
   cilium-envoy-jskrk                 1/1     Running   0          6m30s
   cilium-envoy-k2xtb                 1/1     Running   0          6m31s
   cilium-envoy-w5s9j                 1/1     Running   0          6m31s
   cilium-grwlc                       1/1     Running   0          4m12s
   cilium-operator-68f7766967-5nnbl   1/1     Running   0          4m20s
   cilium-operator-68f7766967-7spfz   1/1     Running   0          4m20s
   cilium-pnxcv                       1/1     Running   0          6m29s
   cilium-r7qkj                       1/1     Running   0          4m12s
   cilium-wxhfn                       1/1     Running   0          4m1s
   cilium-z7hlb                       1/1     Running   0          6m30s
   ```

## Configure Cilium Ingress
<a name="hybrid-nodes-ingress-cilium-ingress-configure"></a>

Cilium Ingress is enabled on Ingress objects by setting the `ingressClassName` to `cilium`. The Service(s) that Cilium creates for Ingress resources can be configured with annotations on the Ingress objects when using the `dedicated` load balancer mode and in the Cilium / Helm configuration when using the `shared` load balancer mode. These annotations are commonly used by Ingress controllers to configure the load balancer infrastructure, or other attributes of the Service such as the service type, load balancer mode, ports, and TLS passthrough. Key annotations are described below. For a full list of supported annotations, see the [Cilium Ingress annotations](https://docs.cilium.io/en/stable/network/servicemesh/ingress/#supported-ingress-annotations) in the Cilium documentation.


| Annotation | Description | 
| --- | --- | 
|   `ingress.cilium.io/loadbalancer-mode`   |   `dedicated`: Dedicated Service of type LoadBalancer for each Ingress resource (default).  `shared`: Single Service of type LoadBalancer for all Ingress resources.  | 
|   `ingress.cilium.io/service-type`   |   `LoadBalancer`: The Service will be of type LoadBalancer (default)  `NodePort`: The Service will be of type NodePort.  | 
|   `service.beta.kubernetes.io/aws-load-balancer-type`   |   `"external"`: Prevent legacy AWS cloud provider from provisioning Classic Load Balancer for Services of type LoadBalancer.  | 
|   `lbipam.cilium.io/ips`   |  List of IP addresses to allocate from Cilium LoadBalancer IPAM  | 

Cilium and the Kubernetes Ingress specification support Exact, Prefix, and Implementation-specific matching rules for Ingress paths. Cilium supports regex as its implementation-specific matching rule. For more information, see [Ingress path types and precedence](https://docs.cilium.io/en/stable/network/servicemesh/ingress/#ingress-path-types-and-precedence) and [Path types examples](https://docs.cilium.io/en/stable/network/servicemesh/path-types/) in the Cilium documentation, and the examples in the [Deploy Cilium Ingress](#hybrid-nodes-ingress-cilium-ingress-deploy) section of this page.

An example Cilium Ingress object is shown below.

```
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  annotations:
    service.beta.kubernetes.io/...
    service.kubernetes.io/...
spec:
  ingressClassName: cilium
  rules:
  ...
```

## Deploy Cilium Ingress
<a name="hybrid-nodes-ingress-cilium-ingress-deploy"></a>

1. Create a sample application. The example below uses the [Istio Bookinfo](https://istio.io/latest/docs/examples/bookinfo/) sample microservices application.

   ```
   kubectl apply -f https://raw.githubusercontent.com/istio/istio/refs/heads/master/samples/bookinfo/platform/kube/bookinfo.yaml
   ```

1. Confirm the application is running successfully.

   ```
   kubectl get pods
   ```

   ```
   NAME                              READY   STATUS    RESTARTS   AGE
   details-v1-766844796b-9965p       1/1     Running   0          81s
   productpage-v1-54bb874995-jmc8j   1/1     Running   0          80s
   ratings-v1-5dc79b6bcd-smzxz       1/1     Running   0          80s
   reviews-v1-598b896c9d-vj7gb       1/1     Running   0          80s
   reviews-v2-556d6457d-xbt8v        1/1     Running   0          80s
   reviews-v3-564544b4d6-cpmvq       1/1     Running   0          80s
   ```

1. Create a file named `my-ingress.yaml` with the following contents. The example below uses the `service.beta.kubernetes.io/aws-load-balancer-type: "external"` annotation to prevent the legacy AWS cloud provider from creating a Classic Load Balancer for the Service of type LoadBalancer that Cilium creates for the Ingress resource.

   ```
   apiVersion: networking.k8s.io/v1
   kind: Ingress
   metadata:
     name: my-ingress
     namespace: default
     annotations:
       service.beta.kubernetes.io/aws-load-balancer-type: "external"
   spec:
     ingressClassName: cilium
     rules:
     - http:
         paths:
         - backend:
             service:
               name: details
               port:
                 number: 9080
           path: /details
           pathType: Prefix
   ```

1. Apply the Ingress resource to your cluster.

   ```
   kubectl apply -f my-ingress.yaml
   ```

1. Confirm the Ingress resource and corresponding Service were created. At this stage, it is expected that the `ADDRESS` field of the Ingress resource is not populated with an IP address or hostname, and that the shared or dedicated Service of type LoadBalancer for the Ingress resource similarly does not have an IP address or hostname assigned.

   ```
   kubectl get ingress my-ingress
   ```

   ```
   NAME         CLASS    HOSTS   ADDRESS   PORTS   AGE
   my-ingress   cilium   *                 80      8s
   ```

   For load balancer mode `shared` 

   ```
   kubectl -n kube-system get svc cilium-ingress
   ```

   ```
   NAME             TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
   cilium-ingress   LoadBalancer   172.16.217.48   <pending>     80:32359/TCP,443:31090/TCP   10m
   ```

   For load balancer mode `dedicated` 

   ```
   kubectl -n default get svc cilium-ingress-my-ingress
   ```

   ```
   NAME                        TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
   cilium-ingress-my-ingress   LoadBalancer   172.16.193.15   <pending>     80:32088/TCP,443:30332/TCP   25s
   ```

1. Proceed to [Service type LoadBalancer](#hybrid-nodes-ingress-cilium-loadbalancer) to configure the Ingress resource to use an IP address allocated by Cilium Load Balancer IPAM, and [Service type NodePort](#hybrid-nodes-ingress-cilium-nodeport) or [Host network](#hybrid-nodes-ingress-cilium-host-network) to configure the Ingress resource to use NodePort or host network addresses.

## Service type LoadBalancer
<a name="hybrid-nodes-ingress-cilium-loadbalancer"></a>

### Existing load balancer infrastructure
<a name="_existing_load_balancer_infrastructure"></a>

By default, for both Cilium Ingress and Cilium Gateway, Cilium creates Kubernetes Service(s) of type LoadBalancer for the Ingress / Gateway resources. The attributes of the Service(s) that Cilium creates can be configured through the Ingress and Gateway resources. When you create Ingress or Gateway resources, the externally exposed IP address or hostnames for the Ingress or Gateway are allocated from the load balancer infrastructure, which is typically provisioned by an Ingress or Gateway controller.

Many Ingress and Gateway controllers use annotations to detect and configure the load balancer infrastructure. The annotations for these Ingress and Gateway controllers are configured on the Ingress or Gateway resources as shown in the previous examples above. Reference your Ingress or Gateway controller’s documentation for the annotations it supports and see the [Kubernetes Ingress documentation](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/) and [Kubernetes Gateway documentation](https://gateway-api.sigs.k8s.io/implementations/) for a list of popular controllers.

**Important**  
Cilium Ingress and Gateway cannot be used with the AWS Load Balancer Controller and AWS Network Load Balancers (NLBs) with EKS Hybrid Nodes. Attempting to use these together results in unregistered targets, as the NLB attempts to directly connect to the Pod IPs that back the Service of type LoadBalancer when the NLB’s `target-type` is set to `ip` (requirement for using NLB with workloads running on EKS Hybrid Nodes).

### No load balancer infrastructure
<a name="_no_load_balancer_infrastructure"></a>

If you do not have load balancer infrastructure and corresponding Ingress / Gateway controller in your environment, Ingress / Gateway resources and corresponding Services of type LoadBalancer can be configured to use IP addresses allocated by Cilium’s [Load Balancer IP address management](https://docs.cilium.io/en/stable/network/lb-ipam/) (LB IPAM). Cilium LB IPAM can be configured with known IP address ranges from your on-premises environment, and can use Cilium’s built-in Border Gateway Protocol (BGP) support or L2 announcements to advertise the LoadBalancer IP addresses to your on-premises network.

The example below shows how to configure Cilium’s LB IPAM with an IP address to use for your Ingress / Gateway resources, and how to configure Cilium BGP Control Plane to advertise the LoadBalancer IP address with the on-premises network. Cilium’s LB IPAM feature is enabled by default, but is not activated until a `CiliumLoadBalancerIPPool` resource is created.

#### Prerequisites
<a name="_prerequisites_4"></a>
+ Cilium Ingress or Gateway installed following the instructions in [Install Cilium Ingress](#hybrid-nodes-ingress-cilium-ingress-install) or [Install Cilium Gateway](#hybrid-nodes-ingress-cilium-gateway-install).
+ Cilium Ingress or Gateway resources with sample application deployed following the instructions in [Deploy Cilium Ingress](#hybrid-nodes-ingress-cilium-ingress-deploy) or [Deploy Cilium Gateway](#hybrid-nodes-ingress-cilium-gateway-deploy).
+ Cilium BGP Control Plane enabled following the instructions in [Configure Cilium BGP for hybrid nodes](hybrid-nodes-cilium-bgp.md). If you do not want to use BGP, you can skip this prerequisite, but you will not be able to access your Ingress or Gateway resource until the LoadBalancer IP address allocated by Cilium LB IPAM is routable on your on-premises network.

#### Procedure
<a name="_procedure_4"></a>

1. Optionally patch the Ingress or Gateway resource to request a specific IP address to use for the Service of type LoadBalancer. If you do not request a specific IP address, Cilium will allocate an IP address from the IP address range configured in the `CiliumLoadBalancerIPPool` resource in the subsequent step. In the commands below, replace `LB_IP_ADDRESS` with the IP address to request for the Service of type LoadBalancer.

    **Gateway** 

   ```
   kubectl patch gateway -n default my-gateway --type=merge -p '{
     "spec": {
       "addresses": [{"type": "IPAddress", "value": "LB_IP_ADDRESS"}]
     }
   }'
   ```

    **Ingress** 

   ```
   kubectl patch ingress my-ingress --type=merge -p '{
     "metadata": {"annotations": {"lbipam.cilium.io/ips": "LB_IP_ADDRESS"}}
   }'
   ```

1. Create a file named `cilium-lbip-pool-ingress.yaml` with a `CiliumLoadBalancerIPPool` resource to configure the Load Balancer IP address range for your Ingress / Gateway resources.
   + If you are using Cilium Ingress, Cilium automatically applies the `cilium.io/ingress: "true"` label to the Services it creates for Ingress resources. You can use this label in the `serviceSelector` field of the `CiliumLoadBalancerIPPool` resource definition to select the Services eligible for LB IPAM.
   + If you are using Cilium Gateway, you can use the `gateway.networking.k8s.io/gateway-name` label in the `serviceSelector` fields of the `CiliumLoadBalancerIPPool` resource definition to select the Gateway resources eligible for LB IPAM.
   + Replace `LB_IP_CIDR` with the IP address range to use for the Load Balancer IP addresses. To select a single IP address, use a `/32` CIDR. For more information, see [LoadBalancer IP Address Management](https://docs.cilium.io/en/stable/network/lb-ipam/) in the Cilium documentation.

     ```
     apiVersion: cilium.io/v2alpha1
     kind: CiliumLoadBalancerIPPool
     metadata:
       name: bookinfo-pool
     spec:
       blocks:
       - cidr: "LB_IP_CIDR"
       serviceSelector:
         # if using Cilium Gateway
         matchExpressions:
           - { key: gateway.networking.k8s.io/gateway-name, operator: In, values: [ my-gateway ] }
         # if using Cilium Ingress
         matchLabels:
           cilium.io/ingress: "true"
     ```

1. Apply the `CiliumLoadBalancerIPPool` resource to your cluster.

   ```
   kubectl apply -f cilium-lbip-pool-ingress.yaml
   ```

1. Confirm an IP address was allocated from Cilium LB IPAM for the Ingress / Gateway resource.

    **Gateway** 

   ```
   kubectl get gateway my-gateway
   ```

   ```
   NAME         CLASS    ADDRESS        PROGRAMMED   AGE
   my-gateway   cilium   LB_IP_ADDRESS    True         6m41s
   ```

    **Ingress** 

   ```
   kubectl get ingress my-ingress
   ```

   ```
   NAME         CLASS    HOSTS   ADDRESS        PORTS   AGE
   my-ingress   cilium   *       LB_IP_ADDRESS   80      10m
   ```

1. Create a file named `cilium-bgp-advertisement-ingress.yaml` with a `CiliumBGPAdvertisement` resource to advertise the LoadBalancer IP address for the Ingress / Gateway resources. If you are not using Cilium BGP, you can skip this step. The LoadBalancer IP address used for your Ingress / Gateway resource must be routable on your on-premises network for you to be able to query the service in the next step.

   ```
   apiVersion: cilium.io/v2alpha1
   kind: CiliumBGPAdvertisement
   metadata:
     name: bgp-advertisement-lb-ip
     labels:
       advertise: bgp
   spec:
     advertisements:
       - advertisementType: "Service"
         service:
           addresses:
             - LoadBalancerIP
         selector:
           # if using Cilium Gateway
           matchExpressions:
             - { key: gateway.networking.k8s.io/gateway-name, operator: In, values: [ my-gateway ] }
           # if using Cilium Ingress
           matchLabels:
             cilium.io/ingress: "true"
   ```

1. Apply the `CiliumBGPAdvertisement` resource to your cluster.

   ```
   kubectl apply -f cilium-bgp-advertisement-ingress.yaml
   ```

1. Access the service using the IP address allocated from Cilium LB IPAM.

   ```
   curl -s http://LB_IP_ADDRESS:80/details/1 | jq
   ```

   ```
   {
     "id": 1,
     "author": "William Shakespeare",
     "year": 1595,
     "type": "paperback",
     "pages": 200,
     "publisher": "PublisherA",
     "language": "English",
     "ISBN-10": "1234567890",
     "ISBN-13": "123-1234567890"
   }
   ```

## Service type NodePort
<a name="hybrid-nodes-ingress-cilium-nodeport"></a>

If you do not have load balancer infrastructure and corresponding Ingress controller in your environment, or if you are self-managing your load balancer infrastructure or using DNS-based load balancing, you can configure Cilium Ingress to create Services of type NodePort for the Ingress resources. When using NodePort with Cilium Ingress, the Service of type NodePort is exposed on a port on each node in port range 30000-32767. In this mode, when traffic reaches any node in the cluster on the NodePort, it is then forwarded to a pod that backs the service, which may be on the same node or a different node.

**Note**  
Cilium Gateway support for NodePort services is planned for Cilium version 1.18.x ([\$127273](https://github.com/cilium/cilium/pull/27273))

### Prerequisites
<a name="_prerequisites_5"></a>
+ Cilium Ingress installed following the instructions in [Install Cilium Ingress](#hybrid-nodes-ingress-cilium-ingress-install).
+ Cilium Ingress resources with sample application deployed following the instructions in [Deploy Cilium Ingress](#hybrid-nodes-ingress-cilium-ingress-deploy).

### Procedure
<a name="_procedure_5"></a>

1. Patch the existing Ingress resource `my-ingress` to change it from Service type LoadBalancer to NodePort.

   ```
   kubectl patch ingress my-ingress --type=merge -p '{
       "metadata": {"annotations": {"ingress.cilium.io/service-type": "NodePort"}}
   }'
   ```

   If you have not created the Ingress resource, you can create it by applying the following Ingress definition to your cluster. Note, the Ingress definition below uses the Istio Bookinfo sample application described in [Deploy Cilium Ingress](#hybrid-nodes-ingress-cilium-ingress-deploy).

   ```
   apiVersion: networking.k8s.io/v1
   kind: Ingress
   metadata:
     name: my-ingress
     namespace: default
     annotations:
       service.beta.kubernetes.io/aws-load-balancer-type: "external"
       "ingress.cilium.io/service-type": "NodePort"
   spec:
     ingressClassName: cilium
     rules:
     - http:
         paths:
         - backend:
             service:
               name: details
               port:
                 number: 9080
           path: /details
           pathType: Prefix
   ```

1. Confirm the Service for the Ingress resource was updated to use Service type NodePort. Note the Port for the HTTP protocol in the output. In the example below this HTTP port is `32353`, which will be used in a subsequent step to query the Service. The benefit of using Cilium Ingress with Service of type NodePort is that you can apply path and host-based routing, as well as network policies for the Ingress traffic, which you cannot do for a standard Service of type NodePort without Ingress.

   ```
   kubectl -n default get svc cilium-ingress-my-ingress
   ```

   ```
   NAME                        TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
   cilium-ingress-my-ingress   NodePort   172.16.47.153   <none>        80:32353/TCP,443:30253/TCP   27m
   ```

1. Get the IP addresses of your nodes in your cluster.

   ```
   kubectl get nodes -o wide
   ```

   ```
   NAME                   STATUS   ROLES    AGE   VERSION               INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
   mi-026d6a261e355fba7   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.150   <none>        Ubuntu 22.04.5 LTS   5.15.0-142-generic   containerd://1.7.27
   mi-082f73826a163626e   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.32    <none>        Ubuntu 22.04.4 LTS   5.15.0-142-generic   containerd://1.7.27
   mi-09183e8a3d755abf6   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.33    <none>        Ubuntu 22.04.4 LTS   5.15.0-142-generic   containerd://1.7.27
   mi-0d78d815980ed202d   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.97    <none>        Ubuntu 22.04.4 LTS   5.15.0-142-generic   containerd://1.7.27
   mi-0daa253999fe92daa   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.100   <none>        Ubuntu 22.04.4 LTS   5.15.0-142-generic   containerd://1.7.27
   ```

1. Access the Service of type NodePort using the IP addresses of your nodes and the NodePort captured above. In the example below the node IP address used is `10.80.146.32` and the NodePort is `32353`. Replace these with the values for your environment.

   ```
   curl -s http://10.80.146.32:32353/details/1 | jq
   ```

   ```
   {
     "id": 1,
     "author": "William Shakespeare",
     "year": 1595,
     "type": "paperback",
     "pages": 200,
     "publisher": "PublisherA",
     "language": "English",
     "ISBN-10": "1234567890",
     "ISBN-13": "123-1234567890"
   }
   ```

## Host network
<a name="hybrid-nodes-ingress-cilium-host-network"></a>

Similar to Service of type NodePort, if you do not have load balancer infrastructure and an Ingress or Gateway controller, or if you are self-managing your load balancing with an external load balancer, you can configure Cilium Ingress and Cilium Gateway to expose Ingress and Gateway resources directly on the host network. When the host network mode is enabled for an Ingress or Gateway resource, the Service of type LoadBalancer and NodePort modes are automatically disabled, host network mode is mutually exclusive with these alternative modes for each Ingress or Gateway resource. Compared to the Service of type NodePort mode, host network mode offers additional flexibility for the range of ports that can be used (it’s not restricted to the 30000-32767 NodePort range) and you can configure a subset of nodes where the Envoy proxies run on the host network.

### Prerequisites
<a name="_prerequisites_6"></a>
+ Cilium Ingress or Gateway installed following the instructions in [Install Cilium Ingress](#hybrid-nodes-ingress-cilium-ingress-install) or [Install Cilium Gateway](#hybrid-nodes-ingress-cilium-gateway-install).

### Procedure
<a name="_procedure_6"></a>

#### Gateway
<a name="_gateway"></a>

1. Create a file named `cilium-gateway-host-network.yaml` with the following content.

   ```
   gatewayAPI:
     enabled: true
     hostNetwork:
       enabled: true
       # uncomment to restrict nodes where Envoy proxies run on the host network
       # nodes:
       #   matchLabels:
       #     role: gateway
   ```

1. Apply the host network Cilium Gateway configuration to your cluster.

   ```
   helm upgrade cilium oci://public.ecr.aws/eks/cilium/cilium \
     --namespace kube-system \
     --reuse-values \
     --set operator.rollOutPods=true \
     -f cilium-gateway-host-network.yaml
   ```

   If you have not created the Gateway resource, you can create it by applying the following Gateway definition to your cluster. The Gateway definition below uses the Istio Bookinfo sample application described in [Deploy Cilium Gateway](#hybrid-nodes-ingress-cilium-gateway-deploy). In the example below, the Gateway resource is configured to use the `8111` port for the HTTP listener, which is the shared listener port for the Envoy proxies running on the host network. If you are using a privileged port (lower than 1023) for the Gateway resource, reference the [Cilium documentation](https://docs.cilium.io/en/stable/network/servicemesh/gateway-api/gateway-api/#bind-to-privileged-port) for instructions.

   ```
   ---
   apiVersion: gateway.networking.k8s.io/v1
   kind: Gateway
   metadata:
     name: my-gateway
   spec:
     gatewayClassName: cilium
     listeners:
     - protocol: HTTP
       port: 8111
       name: web-gw
       allowedRoutes:
         namespaces:
           from: Same
   ---
   apiVersion: gateway.networking.k8s.io/v1
   kind: HTTPRoute
   metadata:
     name: http-app-1
   spec:
     parentRefs:
     - name: my-gateway
       namespace: default
     rules:
     - matches:
       - path:
           type: PathPrefix
           value: /details
       backendRefs:
       - name: details
         port: 9080
   ```

   You can observe the applied Cilium Envoy Configuration with the following command.

   ```
   kubectl get cec cilium-gateway-my-gateway -o yaml
   ```

   You can get the Envoy listener port for the `cilium-gateway-my-gateway` Service with the following command. In this example, the shared listener port is `8111`.

   ```
   kubectl get cec cilium-gateway-my-gateway -o jsonpath={.spec.services[0].ports[0]}
   ```

1. Get the IP addresses of your nodes in your cluster.

   ```
   kubectl get nodes -o wide
   ```

   ```
   NAME                   STATUS   ROLES    AGE   VERSION               INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
   mi-026d6a261e355fba7   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.150   <none>        Ubuntu 22.04.5 LTS   5.15.0-142-generic   containerd://1.7.27
   mi-082f73826a163626e   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.32    <none>        Ubuntu 22.04.4 LTS   5.15.0-142-generic   containerd://1.7.27
   mi-09183e8a3d755abf6   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.33    <none>        Ubuntu 22.04.4 LTS   5.15.0-142-generic   containerd://1.7.27
   mi-0d78d815980ed202d   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.97    <none>        Ubuntu 22.04.4 LTS   5.15.0-142-generic   containerd://1.7.27
   mi-0daa253999fe92daa   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.100   <none>        Ubuntu 22.04.4 LTS   5.15.0-142-generic   containerd://1.7.27
   ```

1. Access the Service using the IP addresses of your nodes and the listener port for the `cilium-gateway-my-gateway` resource. In the example below the node IP address used is `10.80.146.32` and the listener port is `8111`. Replace these with the values for your environment.

   ```
   curl -s http://10.80.146.32:8111/details/1 | jq
   ```

   ```
   {
     "id": 1,
     "author": "William Shakespeare",
     "year": 1595,
     "type": "paperback",
     "pages": 200,
     "publisher": "PublisherA",
     "language": "English",
     "ISBN-10": "1234567890",
     "ISBN-13": "123-1234567890"
   }
   ```

#### Ingress
<a name="_ingress"></a>

Due to an upstream Cilium issue ([\$134028](https://github.com/cilium/cilium/issues/34028)), Cilium Ingress in host network mode requires using `loadbalancerMode: shared`, which creates a single Service of type ClusterIP for all Ingress resources in the cluster. If you are using a privileged port (lower than 1023) for the Ingress resource, reference the [Cilium documentation](https://docs.cilium.io/en/stable/network/servicemesh/ingress/#bind-to-privileged-port) for instructions.

1. Create a file named `cilium-ingress-host-network.yaml` with the following content.

   ```
   ingressController:
     enabled: true
     loadbalancerMode: shared
     # This is a workaround for the upstream Cilium issue
     service:
       externalTrafficPolicy: null
       type: ClusterIP
     hostNetwork:
       enabled: true
       # ensure the port does not conflict with other services on the node
       sharedListenerPort: 8111
       # uncomment to restrict nodes where Envoy proxies run on the host network
       # nodes:
       #   matchLabels:
       #     role: ingress
   ```

1. Apply the host network Cilium Ingress configuration to your cluster.

   ```
   helm upgrade cilium oci://public.ecr.aws/eks/cilium/cilium \
     --namespace kube-system \
     --reuse-values \
     --set operator.rollOutPods=true \
     -f cilium-ingress-host-network.yaml
   ```

   If you have not created the Ingress resource, you can create it by applying the following Ingress definition to your cluster. The Ingress definition below uses the Istio Bookinfo sample application described in [Deploy Cilium Ingress](#hybrid-nodes-ingress-cilium-ingress-deploy).

   ```
   apiVersion: networking.k8s.io/v1
   kind: Ingress
   metadata:
     name: my-ingress
     namespace: default
   spec:
     ingressClassName: cilium
     rules:
     - http:
         paths:
         - backend:
             service:
               name: details
               port:
                 number: 9080
           path: /details
           pathType: Prefix
   ```

   You can observe the applied Cilium Envoy Configuration with the following command.

   ```
   kubectl get cec -n kube-system cilium-ingress -o yaml
   ```

   You can get the Envoy listener port for the `cilium-ingress` Service with the following command. In this example, the shared listener port is `8111`.

   ```
   kubectl get cec -n kube-system cilium-ingress -o jsonpath={.spec.services[0].ports[0]}
   ```

1. Get the IP addresses of your nodes in your cluster.

   ```
   kubectl get nodes -o wide
   ```

   ```
   NAME                   STATUS   ROLES    AGE   VERSION               INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
   mi-026d6a261e355fba7   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.150   <none>        Ubuntu 22.04.5 LTS   5.15.0-142-generic   containerd://1.7.27
   mi-082f73826a163626e   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.32    <none>        Ubuntu 22.04.4 LTS   5.15.0-142-generic   containerd://1.7.27
   mi-09183e8a3d755abf6   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.33    <none>        Ubuntu 22.04.4 LTS   5.15.0-142-generic   containerd://1.7.27
   mi-0d78d815980ed202d   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.97    <none>        Ubuntu 22.04.4 LTS   5.15.0-142-generic   containerd://1.7.27
   mi-0daa253999fe92daa   Ready    <none>   23h   v1.32.3-eks-473151a   10.80.146.100   <none>        Ubuntu 22.04.4 LTS   5.15.0-142-generic   containerd://1.7.27
   ```

1. Access the Service using the IP addresses of your nodes and the `sharedListenerPort` for the `cilium-ingress` resource. In the example below the node IP address used is `10.80.146.32` and the listener port is `8111`. Replace these with the values for your environment.

   ```
   curl -s http://10.80.146.32:8111/details/1 | jq
   ```

   ```
   {
     "id": 1,
     "author": "William Shakespeare",
     "year": 1595,
     "type": "paperback",
     "pages": 200,
     "publisher": "PublisherA",
     "language": "English",
     "ISBN-10": "1234567890",
     "ISBN-13": "123-1234567890"
   }
   ```

# Configure Services of type LoadBalancer for hybrid nodes
<a name="hybrid-nodes-load-balancing"></a>

This topic describes how to configure Layer 4 (L4) load balancing for applications running on Amazon EKS Hybrid Nodes. Kubernetes Services of type LoadBalancer are used to expose Kubernetes applications external to the cluster. Services of type LoadBalancer are commonly used with physical load balancer infrastructure in the cloud or on-premises environment to serve the workload’s traffic. This load balancer infrastructure is commonly provisioned with an environment-specific controller.

 AWS supports AWS Network Load Balancer (NLB) and Cilium for Services of type LoadBalancer running on EKS Hybrid Nodes. The decision to use NLB or Cilium is based on the source of application traffic. If application traffic originates from an AWS Region, AWS recommends using AWS NLB and the AWS Load Balancer Controller. If application traffic originates from the local on-premises or edge environment, AWS recommends using Cilium’s built-in load balancing capabilities, which can be used with or without load balancer infrastructure in your environment.

For Layer 7 (L7) application traffic load balancing, see [Configure Kubernetes Ingress for hybrid nodes](hybrid-nodes-ingress.md). For general information on Load Balancing with EKS, see [Best Practices for Load Balancing](https://docs.aws.amazon.com/eks/latest/best-practices/load-balancing.html).

## AWS Network Load Balancer
<a name="hybrid-nodes-service-lb-nlb"></a>

You can use the [AWS Load Balancer Controller](aws-load-balancer-controller.md) and NLB with the target type `ip` for workloads running on hybrid nodes. When using target type `ip`, NLB forwards traffic directly to the pods, bypassing the Service layer network path. For NLB to reach the pod IP targets on hybrid nodes, your on-premises pod CIDRs must be routable on your on-premises network. Additionally, the AWS Load Balancer Controller uses webhooks and requires direct communication from the EKS control plane. For more information, see [Configure webhooks for hybrid nodes](hybrid-nodes-webhooks.md).
+ See [Route TCP and UDP traffic with Network Load Balancers](network-load-balancing.md) for subnet configuration requirements, and [Install AWS Load Balancer Controller with Helm](lbc-helm.md) and [Best Practices for Load Balancing](https://docs.aws.amazon.com/eks/latest/best-practices/load-balancing.html) for additional information about AWS Network Load Balancer and AWS Load Balancer Controller.
+ See [AWS Load Balancer Controller NLB configurations](https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/service/nlb/) for configurations that can be applied to Services of type `LoadBalancer` with AWS Network Load Balancer.

### Prerequisites
<a name="_prerequisites"></a>
+ Cilium installed following the instructions in [Configure CNI for hybrid nodes](hybrid-nodes-cni.md).
+ Cilium BGP Control Plane enabled following the instructions in [Configure Cilium BGP for hybrid nodes](hybrid-nodes-cilium-bgp.md). If you do not want to use BGP, you must use an alternative method to make your on-premises pod CIDRs routable on your on-premises network, see [Routable remote Pod CIDRs](hybrid-nodes-concepts-kubernetes.md#hybrid-nodes-concepts-k8s-pod-cidrs) for more information.
+ Helm installed in your command-line environment, see [Setup Helm instructions](helm.md).
+ eksctl installed in your command-line environment, see [Setup eksctl instructions](install-kubectl.md#eksctl-install-update).

### Procedure
<a name="_procedure"></a>

1. Download an IAM policy for the AWS Load Balancer Controller that allows it to make calls to AWS APIs on your behalf.

   ```
   curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/refs/heads/main/docs/install/iam_policy.json
   ```

1. Create an IAM policy using the policy downloaded in the previous step.

   ```
   aws iam create-policy \
       --policy-name AWSLoadBalancerControllerIAMPolicy \
       --policy-document file://iam_policy.json
   ```

1. Replace the values for cluster name (`CLUSTER_NAME`), AWS Region (`AWS_REGION`), and AWS account ID (`AWS_ACCOUNT_ID`) with your settings and run the following command.

   ```
   eksctl create iamserviceaccount \
       --cluster=CLUSTER_NAME \
       --namespace=kube-system \
       --name=aws-load-balancer-controller \
       --attach-policy-arn=arn:aws:iam::AWS_ACCOUNT_ID:policy/AWSLoadBalancerControllerIAMPolicy \
       --override-existing-serviceaccounts \
       --region AWS_REGION \
       --approve
   ```

1. Add the eks-charts Helm chart repository. AWS maintains this repository on GitHub.

   ```
   helm repo add eks https://aws.github.io/eks-charts
   ```

1. Update your local Helm repository to make sure that you have the most recent charts.

   ```
   helm repo update eks
   ```

1. Install the AWS Load Balancer Controller. Replace the values for cluster name (`CLUSTER_NAME`), AWS Region (`AWS_REGION`), VPC ID (`VPC_ID`), and AWS Load Balancer Controller Helm chart version (`AWS_LBC_HELM_VERSION`) with your settings. You can find the latest version of the Helm chart by running `helm search repo eks/aws-load-balancer-controller --versions`. If you are running a mixed mode cluster with both hybrid nodes and nodes in AWS Cloud, you can run the AWS Load Balancer Controller on cloud nodes following the instructions at [AWS Load Balancer Controller](hybrid-nodes-webhooks.md#hybrid-nodes-mixed-lbc).

   ```
   helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
     -n kube-system \
     --version AWS_LBC_HELM_VERSION \
     --set clusterName=CLUSTER_NAME \
     --set region=AWS_REGION \
     --set vpcId=VPC_ID \
     --set serviceAccount.create=false \
     --set serviceAccount.name=aws-load-balancer-controller
   ```

1. Verify the AWS Load Balancer Controller was installed successfully.

   ```
   kubectl get -n kube-system deployment aws-load-balancer-controller
   ```

   ```
   NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
   aws-load-balancer-controller   2/2     2            2           84s
   ```

1. Define a sample application in a file named `tcp-sample-app.yaml`. The example below uses a simple NGINX deployment with a TCP port.

   ```
   apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: tcp-sample-app
     namespace: default
   spec:
     replicas: 3
     selector:
       matchLabels:
         app: nginx
     template:
       metadata:
         labels:
           app: nginx
       spec:
         containers:
           - name: nginx
             image: public.ecr.aws/nginx/nginx:1.23
             ports:
               - name: tcp
                 containerPort: 80
   ```

1. Apply the deployment to your cluster.

   ```
   kubectl apply -f tcp-sample-app.yaml
   ```

1. Define a Service of type LoadBalancer for the deployment in a file named `tcp-sample-service.yaml`.

   ```
   apiVersion: v1
   kind: Service
   metadata:
     name: tcp-sample-service
     namespace: default
     annotations:
       service.beta.kubernetes.io/aws-load-balancer-type: external
       service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
       service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
   spec:
     ports:
       - port: 80
         targetPort: 80
         protocol: TCP
     type: LoadBalancer
     selector:
       app: nginx
   ```

1. Apply the Service configuration to your cluster.

   ```
   kubectl apply -f tcp-sample-service.yaml
   ```

1. Provisioning the NLB for the Service may take a few minutes. Once the NLB is provisioned, the Service will have an address assigned to it that corresponds to the DNS name of the NLB deployment.

   ```
   kubectl get svc tcp-sample-service
   ```

   ```
   NAME                 TYPE           CLUSTER-IP       EXTERNAL-IP                                                                    PORT(S)        AGE
   tcp-sample-service   LoadBalancer   172.16.115.212   k8s-default-tcpsampl-xxxxxxxxxx-xxxxxxxxxxxxxxxx.elb.<region>.amazonaws.com   80:30396/TCP   8s
   ```

1. Access the Service using the address of the NLB.

   ```
   curl k8s-default-tcpsampl-xxxxxxxxxx-xxxxxxxxxxxxxxxx.elb.<region>.amazonaws.com
   ```

   An example output is below.

   ```
   <!DOCTYPE html>
   <html>
   <head>
   <title>Welcome to nginx!</title>
   [...]
   ```

1. Clean up the resources you created.

   ```
   kubectl delete -f tcp-sample-service.yaml
   kubectl delete -f tcp-sample-app.yaml
   ```

## Cilium in-cluster load balancing
<a name="hybrid-nodes-service-lb-cilium"></a>

Cilium can be used as an in-cluster load balancer for workloads running on EKS Hybrid Nodes, which can be useful for environments that do not have load balancer infrastructure. Cilium’s load balancing capabilities are built on a combination of Cilium features including kube-proxy replacement, Load Balancer IP Address Management (IPAM), and BGP Control Plane. The responsibilities of these features are detailed below:
+  **Cilium kube-proxy replacement**: Handles routing Service traffic to backend pods.
+  **Cilium Load Balancer IPAM**: Manages IP addresses that can be assigned to Services of type `LoadBalancer`.
+  **Cilium BGP Control Plane**: Advertises IP addresses allocated by Load Balancer IPAM to the on-premises network.

If you are not using Cilium’s kube-proxy replacement, you can still use Cilium Load Balancer IPAM and BGP Control Plane to allocate and assign IP addresses for Services of type LoadBalancer. If you are not using Cilium’s kube-proxy replacement, the load balancing for Services to backend pods is handled by kube-proxy and iptables rules by default in EKS.

### Prerequisites
<a name="_prerequisites_2"></a>
+ Cilium installed following the instructions in [Configure CNI for hybrid nodes](hybrid-nodes-cni.md) with or without kube-proxy replacement enabled. Cilium’s kube-proxy replacement requires running an operating system with a Linux kernel at least as recent as v4.19.57, v5.1.16, or v5.2.0. All recent versions of the operating systems supported for use with hybrid nodes meet this criteria, with the exception of Red Hat Enterprise Linux (RHEL) 8.x.
+ Cilium BGP Control Plane enabled following the instructions in [Configure Cilium BGP for hybrid nodes](hybrid-nodes-cilium-bgp.md). If you do not want to use BGP, you must use an alternative method to make your on-premises pod CIDRs routable on your on-premises network, see [Routable remote Pod CIDRs](hybrid-nodes-concepts-kubernetes.md#hybrid-nodes-concepts-k8s-pod-cidrs) for more information.
+ Helm installed in your command-line environment, see [Setup Helm instructions](helm.md).

### Procedure
<a name="_procedure_2"></a>

1. Create a file named `cilium-lbip-pool-loadbalancer.yaml` with a `CiliumLoadBalancerIPPool` resource to configure the Load Balancer IP address range for your Services of type LoadBalancer.
   + Replace `LB_IP_CIDR` with the IP address range to use for the Load Balancer IP addresses. To select a single IP address, use a `/32` CIDR. For more information, see [LoadBalancer IP Address Management](https://docs.cilium.io/en/stable/network/lb-ipam/) in the Cilium documentation.
   + The `serviceSelector` field is configured to match against the name of the Service you will create in a subsequent step. With this configuration, IPs from this pool will only be allocated to Services with the name `tcp-sample-service`.

     ```
     apiVersion: cilium.io/v2alpha1
     kind: CiliumLoadBalancerIPPool
     metadata:
       name: tcp-service-pool
     spec:
       blocks:
       - cidr: "LB_IP_CIDR"
       serviceSelector:
         matchLabels:
           io.kubernetes.service.name: tcp-sample-service
     ```

1. Apply the `CiliumLoadBalancerIPPool` resource to your cluster.

   ```
   kubectl apply -f cilium-lbip-pool-loadbalancer.yaml
   ```

1. Confirm there is at least one IP address available in the pool.

   ```
   kubectl get ciliumloadbalancerippools.cilium.io
   ```

   ```
   NAME               DISABLED   CONFLICTING   IPS AVAILABLE   AGE
   tcp-service-pool   false      False         1               24m
   ```

1. Create a file named `cilium-bgp-advertisement-loadbalancer.yaml` with a `CiliumBGPAdvertisement` resource to advertise the load balancer IP address for the Service you will create in the next step. If you are not using Cilium BGP, you can skip this step. The load balancer IP address used for your Service must be routable on your on-premises network for you to be able to query the service in the final step.
   + The `advertisementType` field is set to `Service` and `service.addresses` is set to `LoadBalancerIP` to only advertise the `LoadBalancerIP` for Services of type `LoadBalancer`.
   + The `selector` field is configured to match against the name of the Service you will create in a subsequent step. With this configuration, only `LoadBalancerIP` for Services with the name `tcp-sample-service` will be advertised.

     ```
     apiVersion: cilium.io/v2alpha1
     kind: CiliumBGPAdvertisement
     metadata:
       name: bgp-advertisement-tcp-service
       labels:
         advertise: bgp
     spec:
       advertisements:
         - advertisementType: "Service"
           service:
             addresses:
               - LoadBalancerIP
           selector:
             matchLabels:
               io.kubernetes.service.name: tcp-sample-service
     ```

1. Apply the `CiliumBGPAdvertisement` resource to your cluster. If you are not using Cilium BGP, you can skip this step.

   ```
   kubectl apply -f cilium-bgp-advertisement-loadbalancer.yaml
   ```

1. Define a sample application in a file named `tcp-sample-app.yaml`. The example below uses a simple NGINX deployment with a TCP port.

   ```
   apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: tcp-sample-app
     namespace: default
   spec:
     replicas: 3
     selector:
       matchLabels:
         app: nginx
     template:
       metadata:
         labels:
           app: nginx
       spec:
         containers:
           - name: nginx
             image: public.ecr.aws/nginx/nginx:1.23
             ports:
               - name: tcp
                 containerPort: 80
   ```

1. Apply the deployment to your cluster.

   ```
   kubectl apply -f tcp-sample-app.yaml
   ```

1. Define a Service of type LoadBalancer for the deployment in a file named `tcp-sample-service.yaml`.
   + You can request a specific IP address from the load balancer IP pool with the `lbipam.cilium.io/ips` annotation on the Service object. You can remove this annotation if you do not want to request a specific IP address for the Service.
   + The `loadBalancerClass` spec field is required to prevent the legacy AWS Cloud Provider from creating a Classic Load Balancer for the Service. In the example below this is configured to `io.cilium/bgp-control-plane` to use Cilium’s BGP Control Plane as the load balancer class. This field can alternatively be configured to `io.cilium/l2-announcer` to use Cilium’s [L2 Announcements feature](https://docs.cilium.io/en/latest/network/l2-announcements/) (currently in beta and not officially supported by AWS).

     ```
     apiVersion: v1
     kind: Service
     metadata:
       name: tcp-sample-service
       namespace: default
       annotations:
         lbipam.cilium.io/ips: "LB_IP_ADDRESS"
     spec:
       loadBalancerClass: io.cilium/bgp-control-plane
       ports:
         - port: 80
           targetPort: 80
           protocol: TCP
       type: LoadBalancer
       selector:
         app: nginx
     ```

1. Apply the Service to your cluster. The Service will be created with an external IP address that you can use to access the application.

   ```
   kubectl apply -f tcp-sample-service.yaml
   ```

1. Verify the Service was created successfully and has an IP assigned to it from the `CiliumLoadBalancerIPPool` created in the previous step.

   ```
   kubectl get svc tcp-sample-service
   ```

   ```
   NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
   tcp-sample-service   LoadBalancer   172.16.117.76   LB_IP_ADDRESS   80:31129/TCP   14m
   ```

1. If you are using Cilium in kube-proxy replacement mode, you can confirm Cilium is handling the load balancing for the Service by running the following command. In the output below, the `10.86.2.x` addresses are the pod IP addresses of the backend pods for the Service.

   ```
   kubectl -n kube-system exec ds/cilium -- cilium-dbg service list
   ```

   ```
   ID   Frontend               Service Type   Backend
   ...
   41   LB_IP_ADDRESS:80/TCP   LoadBalancer   1 => 10.86.2.76:80/TCP (active)
                                              2 => 10.86.2.130:80/TCP (active)
                                              3 => 10.86.2.141:80/TCP (active)
   ```

1. Confirm Cilium is advertising the IP address to the on-premises network via BGP. In the example below, there are five hybrid nodes, each advertising the `LB_IP_ADDRESS` for the `tcp-sample-service` Service to the on-premises network.

   ```
   Node                   VRouter      Prefix             NextHop   Age     Attrs
   mi-026d6a261e355fba7   NODES_ASN
                     LB_IP_ADDRESS/32   0.0.0.0   12m3s   [{Origin: i} {Nexthop: 0.0.0.0}]
   mi-082f73826a163626e   NODES_ASN
                     LB_IP_ADDRESS/32   0.0.0.0   12m3s   [{Origin: i} {Nexthop: 0.0.0.0}]
   mi-09183e8a3d755abf6   NODES_ASN
                     LB_IP_ADDRESS/32   0.0.0.0   12m3s   [{Origin: i} {Nexthop: 0.0.0.0}]
   mi-0d78d815980ed202d   NODES_ASN
                     LB_IP_ADDRESS/32   0.0.0.0   12m3s   [{Origin: i} {Nexthop: 0.0.0.0}]
   mi-0daa253999fe92daa   NODES_ASN
                     LB_IP_ADDRESS/32   0.0.0.0   12m3s   [{Origin: i} {Nexthop: 0.0.0.0}]
   ```

1. Access the Service using the assigned load balancerIP address.

   ```
   curl LB_IP_ADDRESS
   ```

   An example output is below.

   ```
   <!DOCTYPE html>
   <html>
   <head>
   <title>Welcome to nginx!</title>
   [...]
   ```

1. Clean up the resources you created.

   ```
   kubectl delete -f tcp-sample-service.yaml
   kubectl delete -f tcp-sample-app.yaml
   kubectl delete -f cilium-lb-ip-pool.yaml
   kubectl delete -f cilium-bgp-advertisement.yaml
   ```

# Configure Kubernetes Network Policies for hybrid nodes
<a name="hybrid-nodes-network-policies"></a>

 AWS supports Kubernetes Network Policies (Layer 3 / Layer 4) for pod ingress and egress traffic when using Cilium as the CNI with EKS Hybrid Nodes. If you are running EKS clusters with nodes in AWS Cloud, AWS supports the [Amazon VPC CNI for Kubernetes Network Policies](cni-network-policy.md).

This topic covers how to configure Cilium and Kubernetes Network Policies with EKS Hybrid Nodes. For detailed information on Kubernetes Network Policies, see [Kubernetes Network Policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) in the Kubernetes documentation.

## Configure network policies
<a name="hybrid-nodes-configure-network-policies"></a>

### Considerations
<a name="_considerations"></a>
+  AWS supports the upstream Kubernetes Network Policies and specfication for pod ingress and egress. AWS currently does not support `CiliumNetworkPolicy` or `CiliumClusterwideNetworkPolicy`.
+ The `policyEnforcementMode` Helm value can be used to control the default Cilium policy enforcement behavior. The default behavior allows all egress and ingress traffic. When an endpoint is selected by a network policy, it transitions to a default-deny state, where only explicitly allowed traffic is allowed. See the Cilium documentation for more information on the [default policy mode](https://docs.cilium.io/en/stable/security/policy/intro/#policy-mode-default) and [policy enforcement modes](https://docs.cilium.io/en/stable/security/policy/intro/#policy-enforcement-modes).
+ If you are changing `policyEnforcementMode` for an existing Cilium installation, you must restart the Cilium agent DaemonSet to apply the new policy enforcement mode.
+ Use `namespaceSelector` and `podSelector` to allow or deny traffic to/from namespaces and pods with matching labels. The `namespaceSelector` and `podSelector` can be used with `matchLabels` or `matchExpressions` to select namespaces and pods based on their labels.
+ Use `ingress.ports` and `egress.ports` to allow or deny traffic to/from ports and protocols.
+ The `ipBlock` field cannot be used to selectively allow or deny traffic to/from pod IP addresses ([\$19209](https://github.com/cilium/cilium/issues/9209)). Using `ipBlock` selectors for node IPs is a beta feature in Cilium and is not supported by AWS.
+ See the [NetworkPolicy resource](https://kubernetes.io/docs/concepts/services-networking/network-policies/#networkpolicy-resource) in the Kubernetes documentation for information on the available fields for Kubernetes Network Policies.

### Prerequisites
<a name="_prerequisites"></a>
+ Cilium installed following the instructions in [Configure CNI for hybrid nodes](hybrid-nodes-cni.md).
+ Helm installed in your command-line environment, see [Setup Helm instructions](helm.md).

### Procedure
<a name="_procedure"></a>

The following procedure sets up network policies for a sample microservices application so that components can only talk to other components that are required for the application to function. The procedure uses the [Istio Bookinfo](https://istio.io/latest/docs/examples/bookinfo/) sample microservices application.

The Bookinfo application consists of four separate microservices with the following relationships:
+  **productpage**. The productpage microservice calls the details and reviews microservices to populate the page.
+  **details**. The details microservice contains book information.
+  **reviews**. The reviews microservice contains book reviews. It also calls the ratings microservice.
+  **ratings**. The ratings microservice contains book ranking information that accompanies a book review.

  1. Create the sample application.

     ```
     kubectl apply -f https://raw.githubusercontent.com/istio/istio/refs/heads/master/samples/bookinfo/platform/kube/bookinfo.yaml
     ```

  1. Confirm the application is running successfully and note the pod IP address for the productpage microservice. You will use this pod IP address to query each microservice in the subsequent steps.

     ```
     kubectl get pods -o wide
     ```

     ```
     NAME                              READY   STATUS    RESTARTS   AGE   IP            NODE
     details-v1-766844796b-9wff2       1/1     Running   0          7s    10.86.3.7     mi-0daa253999fe92daa
     productpage-v1-54bb874995-lwfgg   1/1     Running   0          7s    10.86.2.193   mi-082f73826a163626e
     ratings-v1-5dc79b6bcd-59njm       1/1     Running   0          7s    10.86.2.232   mi-082f73826a163626e
     reviews-v1-598b896c9d-p2289       1/1     Running   0          7s    10.86.2.47    mi-026d6a261e355fba7
     reviews-v2-556d6457d-djktc        1/1     Running   0          7s    10.86.3.58    mi-0daa253999fe92daa
     reviews-v3-564544b4d6-g8hh4       1/1     Running   0          7s    10.86.2.69    mi-09183e8a3d755abf6
     ```

  1. Create a pod that will be used throughout to test the network policies. Note the pod is created in the `default` namespace with the label `access: true`.

     ```
     kubectl run curl-pod --image=curlimages/curl -i --tty --labels=access=true --namespace=default --overrides='{"spec": { "nodeSelector": {"eks.amazonaws.com/compute-type": "hybrid"}}}' -- /bin/sh
     ```

  1. Test access to the productpage microservice. In the example below, we use the pod IP address of the productpage pod (`10.86.2.193`) to query the microservice. Replace this with the pod IP address of the productpage pod in your environment.

     ```
     curl -s http://10.86.2.193:9080/productpage | grep -o "<title>.*</title>"
     ```

     ```
     <title>Simple Bookstore App</title>
     ```

  1. You can exit the test curl pod by typing `exit` and can reattach to the pod by running the following command.

     ```
     kubectl attach curl-pod -c curl-pod -i -t
     ```

  1. To demonstrate the effects of the network policies in the following steps, we first create a network policy that denies all traffic for the BookInfo microservices. Create a file called `network-policy-deny-bookinfo.yaml` that defines the deny network policy.

     ```
     apiVersion: networking.k8s.io/v1
     kind: NetworkPolicy
     metadata:
       name: deny-bookinfo
       namespace: default
     spec:
       podSelector:
         matchExpressions:
         - key: app
           operator: In
           values: ["productpage", "details", "reviews", "ratings"]
       policyTypes:
       - Ingress
       - Egress
     ```

  1. Apply the deny network policy to your cluster.

     ```
     kubectl apply -f network-policy-default-deny-bookinfo.yaml
     ```

  1. Test access to the BookInfo application. In the example below, we use the pod IP address of the productpage pod (`10.86.2.193`) to query the microservice. Replace this with the pod IP address of the productpage pod in your environment.

     ```
     curl http://10.86.2.193:9080/productpage --max-time 10
     ```

     ```
     curl: (28) Connection timed out after 10001 milliseconds
     ```

  1. Create a file called `network-policy-productpage.yaml` that defines the productpage network policy. The policy has the following rules:
     + allows ingress traffic from pods with the label `access: true` (the curl pod created in the previous step)
     + allows egress TCP traffic on port `9080` for the details, reviews, and ratings microservices
     + allows egress TCP/UDP traffic on port `53` for CoreDNS which runs in the `kube-system` namespace

       ```
       apiVersion: networking.k8s.io/v1
       kind: NetworkPolicy
       metadata:
         name: productpage-policy
         namespace: default
       spec:
         podSelector:
           matchLabels:
             app: productpage
         policyTypes:
         - Ingress
         - Egress
         ingress:
         - from:
           - podSelector:
               matchLabels:
                 access: "true"
         egress:
         - to:
           - podSelector:
               matchExpressions:
               - key: app
                 operator: In
                 values: ["details", "reviews", "ratings"]
           ports:
           - port: 9080
             protocol: TCP
         - to:
           - namespaceSelector:
               matchLabels:
                 kubernetes.io/metadata.name: kube-system
             podSelector:
               matchLabels:
                 k8s-app: kube-dns
           ports:
           - port: 53
             protocol: UDP
           - port: 53
             protocol: TCP
       ```

  1. Apply the productpage network policy to your cluster.

     ```
     kubectl apply -f network-policy-productpage.yaml
     ```

  1. Connect to the curl pod and test access to the Bookinfo application. Access to the productpage microservice is now allowed, but the other microservices are still denied because they are still subject to the deny network policy. In the examples below, we use the pod IP address of the productpage pod (`10.86.2.193`) to query the microservice. Replace this with the pod IP address of the productpage pod in your environment.

     ```
     kubectl attach curl-pod -c curl-pod -i -t
     ```

     ```
     curl -s http://10.86.2.193:9080/productpage | grep -o "<title>.*</title>"
     <title>Simple Bookstore App</title>
     ```

     ```
     curl -s http://10.86.2.193:9080/api/v1/products/1
     {"error": "Sorry, product details are currently unavailable for this book."}
     ```

     ```
     curl -s http://10.86.2.193:9080/api/v1/products/1/reviews
     {"error": "Sorry, product reviews are currently unavailable for this book."}
     ```

     ```
     curl -s http://10.86.2.193:9080/api/v1/products/1/ratings
     {"error": "Sorry, product ratings are currently unavailable for this book."}
     ```

  1. Create a file called `network-policy-details.yaml` that defines the details network policy. The policy allows only ingress traffic from the productpage microservice.

     ```
     apiVersion: networking.k8s.io/v1
     kind: NetworkPolicy
     metadata:
       name: details-policy
       namespace: default
     spec:
       podSelector:
         matchLabels:
           app: details
       policyTypes:
       - Ingress
       ingress:
       - from:
         - podSelector:
             matchLabels:
               app: productpage
     ```

  1. Create a file called `network-policy-reviews.yaml` that defines the reviews network policy. The policy allows only ingress traffic from the productpage microservice and only egress traffic to the ratings microservice and CoreDNS.

     ```
     apiVersion: networking.k8s.io/v1
     kind: NetworkPolicy
     metadata:
       name: reviews-policy
       namespace: default
     spec:
       podSelector:
         matchLabels:
           app: reviews
       policyTypes:
       - Ingress
       - Egress
       ingress:
       - from:
         - podSelector:
             matchLabels:
               app: productpage
       egress:
       - to:
         - podSelector:
             matchLabels:
               app: ratings
       - to:
         - namespaceSelector:
             matchLabels:
               kubernetes.io/metadata.name: kube-system
           podSelector:
             matchLabels:
               k8s-app: kube-dns
         ports:
         - port: 53
           protocol: UDP
         - port: 53
           protocol: TCP
     ```

  1. Create a file called `network-policy-ratings.yaml` that defines the ratings network policy. The policy allows only ingress traffic from the productpage and reviews microservices.

     ```
     apiVersion: networking.k8s.io/v1
     kind: NetworkPolicy
     metadata:
       name: ratings-policy
       namespace: default
     spec:
       podSelector:
         matchLabels:
           app: ratings
       policyTypes:
       - Ingress
       ingress:
       - from:
         - podSelector:
             matchExpressions:
             - key: app
               operator: In
               values: ["productpage", "reviews"]
     ```

  1. Apply the details, reviews, and ratings network policies to your cluster.

     ```
     kubectl apply -f network-policy-details.yaml
     kubectl apply -f network-policy-reviews.yaml
     kubectl apply -f network-policy-ratings.yaml
     ```

  1. Connect to the curl pod and test access to the Bookinfo application. In the examples below, we use the pod IP address of the productpage pod (`10.86.2.193`) to query the microservice. Replace this with the pod IP address of the productpage pod in your environment.

     ```
     kubectl attach curl-pod -c curl-pod -i -t
     ```

     Test the details microservice.

     ```
     curl -s http://10.86.2.193:9080/api/v1/products/1
     ```

     ```
     {"id": 1, "author": "William Shakespeare", "year": 1595, "type": "paperback", "pages": 200, "publisher": "PublisherA", "language": "English", "ISBN-10": "1234567890", "ISBN-13": "123-1234567890"}
     ```

     Test the reviews microservice.

     ```
     curl -s http://10.86.2.193:9080/api/v1/products/1/reviews
     ```

     ```
     {"id": "1", "podname": "reviews-v1-598b896c9d-p2289", "clustername": "null", "reviews": [{"reviewer": "Reviewer1", "text": "An extremely entertaining play by Shakespeare. The slapstick humour is refreshing!"}, {"reviewer": "Reviewer2", "text": "Absolutely fun and entertaining. The play lacks thematic depth when compared to other plays by Shakespeare."}]}
     ```

     Test the ratings microservice.

     ```
     curl -s http://10.86.2.193:9080/api/v1/products/1/ratings
     ```

     ```
     {"id": 1, "ratings": {"Reviewer1": 5, "Reviewer2": 4}}
     ```

  1. Clean up the resources you created in this procedure.

     ```
     kubectl delete -f network-policy-deny-bookinfo.yaml
     kubectl delete -f network-policy-productpage.yaml
     kubectl delete -f network-policy-details.yaml
     kubectl delete -f network-policy-reviews.yaml
     kubectl delete -f network-policy-ratings.yaml
     kubectl delete -f https://raw.githubusercontent.com/istio/istio/refs/heads/master/samples/bookinfo/platform/kube/bookinfo.yaml
     kubectl delete pod curl-pod
     ```

# Concepts for hybrid nodes
<a name="hybrid-nodes-concepts"></a>

With *Amazon EKS Hybrid Nodes*, you join physical or virtual machines running in on-premises or edge environments to Amazon EKS clusters running in the AWS Cloud. This approach brings many benefits, but also introduces new networking concepts and architectures for those familiar with running Kubernetes clusters in a single network environment.

The following sections dive deep into the Kubernetes and networking concepts for EKS Hybrid Nodes and details how traffic flows through the hybrid architecture. These sections require that you are familiar with basic Kubernetes networking knowledge, such as the concepts of pods, nodes, services, Kubernetes control plane, kubelet and kube-proxy.

We recommend reading these pages in order, starting with the [Networking concepts for hybrid nodes](hybrid-nodes-concepts-networking.md), then the [Kubernetes concepts for hybrid nodes](hybrid-nodes-concepts-kubernetes.md), and finally the [Network traffic flows for hybrid nodes](hybrid-nodes-concepts-traffic-flows.md).

**Topics**
+ [

# Networking concepts for hybrid nodes
](hybrid-nodes-concepts-networking.md)
+ [

# Kubernetes concepts for hybrid nodes
](hybrid-nodes-concepts-kubernetes.md)
+ [

# Network traffic flows for hybrid nodes
](hybrid-nodes-concepts-traffic-flows.md)

# Networking concepts for hybrid nodes
<a name="hybrid-nodes-concepts-networking"></a>

This section details the core networking concepts and the constraints you must consider when designing your network topology for EKS Hybrid Nodes.

## Networking concepts for EKS Hybrid Nodes
<a name="_networking_concepts_for_eks_hybrid_nodes"></a>

![\[High level hybrid nodes network diagram\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-nodes-highlevel-network.png)


 **VPC as the network hub** 

All traffic that crosses the cloud boundary routes through your VPC. This includes traffic between the EKS control plane or pods running in AWS to hybrid nodes or pods running on them. You can think of your cluster’s VPC as the network hub between your hybrid nodes and the rest of the cluster. This architecture gives you full control of the traffic and its routing but also makes it your responsibility to correctly configure routes, security groups, and firewalls for the VPC.

 **EKS control plane to the VPC** 

The EKS control plane attaches **Elastic Network Interfaces (ENIs)** to your VPC. These ENIs handle traffic to and from the EKS API server. You control the placement of the EKS control plane ENIs when you configure your cluster, as EKS attaches ENIs to the subnets you pass during cluster creation.

EKS associates Security Groups to the ENIs that EKS attaches to your subnets. These security groups allow traffic to and from the EKS control plane through the ENIs. This is important for EKS Hybrid Nodes because you must allow traffic from the hybrid nodes and the pods running on them to the EKS control plane ENIs.

 **Remote Node Networks** 

The remote node networks, specifically the remote node CIDRs, are the ranges of IPs assigned to the machines you use as hybrid nodes. When you provision hybrid nodes, they reside in your on-premises data center or edge location, which is a different network domain than the EKS control plane and VPC. Each hybrid node has an IP address, or addresses, from a remote node CIDR that is distinct from the subnets in your VPC.

You configure the EKS cluster with these remote node CIDRs so EKS knows to route all traffic destined for the hybrid nodes IPs through your cluster VPC, such as requests to the kubelet API. The connections to the `kubelet` API are used in the `kubectl attach`, `kubectl cp`, `kubectl exec`, `kubectl logs`, and `kubectl port-forward` commands.

 **Remote Pod Networks** 

The remote pod networks are the ranges of IPs assigned to the pods running on the hybrid nodes. Generally, you configure your CNI with these ranges and the IP Address Management (IPAM) functionality of the CNI takes care of assigning a slice of these ranges to each hybrid node. When you create a pod, the CNI assigns an IP to the pod from the slice allocated to the node where the pod has been scheduled.

You configure the EKS cluster with these remote pod CIDRs so the EKS control plane knows to route all traffic destined for the pods running on the hybrid nodes through your cluster’s VPC, such as communication with webhooks.

![\[Remote Pod Networks\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-nodes-remote-pod-cidrs.png)


 **On-premises to the VPC** 

The on-premises network you use for hybrid nodes must route to the VPC you use for your EKS cluster. There are several [Network-to-Amazon VPC connectivity option](https://docs.aws.amazon.com/whitepapers/latest/aws-vpc-connectivity-options/network-to-amazon-vpc-connectivity-options.html) available to connect your on-premises network to a VPC. You can also use your own VPN solution.

It is important that you configure the routing correctly on the AWS Cloud side in the VPC and in your on-premises network, so that both networks route the right traffic through the connection for the two networks.

In the VPC, all traffic going to the remote node and remote pod networks must route through the connection to your on-premises network (referred to as the "gateway"). If some of your subnets have different route tables, you must configure each route table with the routes for your hybrid nodes and the pods running on them. This is true for the subnets where the EKS control plane ENIs are attached to, and subnets that contain EC2 nodes or pods that must communicate with hybrid nodes.

In your on-premises network, you must configure your network to allow traffic to and from your EKS cluster’s VPC and the other AWS services required for hybrid nodes. The traffic for the EKS cluster traverses the gateway in both directions.

## Networking constraints
<a name="_networking_constraints"></a>

 **Fully routed network** 

The main constraint is that the EKS control plane and all nodes, cloud or hybrid nodes, need to form a **fully routed** network. This means that all nodes must be able to reach each other at layer three, by IP address.

The EKS control plane and cloud nodes are already reachable from each other because they are in a flat network (the VPC). The hybrid nodes, however, are in a different network domain. This is why you need to configure additional routing in the VPC and on your on-premises network to route traffic between the hybrid nodes and the rest of the cluster. If the hybrid nodes are reachable from each other and from the VPC, your hybrid nodes can be in one single flat network or in multiple segmented networks.

 **Routable remote pod CIDRs** 

For the EKS control plane to communicate with pods running on hybrid nodes (for example, webhooks or the Metrics Server) or for pods running on cloud nodes to communicate with pods running on hybrid nodes (workload east-west communication), your remote pod CIDR must be routable from the VPC. This means that the VPC must be able to route traffic to the pod CIDRs through the gateway to your on-premises network and that your on-premises network must be able to route the traffic for a pod to the right node.

It’s important to note the distinction between the pod routing requirements in the VPC and on-premises. The VPC only needs to know that any traffic going to a remote pod should go through the gateway. If you only have one remote pod CIDR, you only need one route.

This requirement is true for all hops in your on-premises network up to the local router in the same subnet as your hybrid nodes. This is the only router that needs to be aware of the pod CIDR slice assigned to each node, making sure that traffic for a particular pod gets delivered to the node where the pod has been scheduled.

You can choose to propagate these routes for the on-premises pod CIDRs from your local on-premises router to the VPC route tables, but it isn’t necessary. If your on-premises pod CIDRs change frequently and your VPC route tables need to be updated to reflect the changing pod CIDRs, we recommend that you propagate the on-premises pod CIDRs to the VPC route tables, but this is uncommon.

Note, the constraint for making your on-premises pod CIDRs routable is optional. If you don’t need to run webhooks on your hybrid nodes or have pods on cloud nodes talk to pods on hybrid nodes, you don’t need to configure routing for the pod CIDRs on your on-premises network.

 *Why do the on-premises pod CIDRs need to be routable with hybrid nodes?* 

When using EKS with the VPC CNI for your cloud nodes, the VPC CNI assigns IPs directly from the VPC to the pods. This means there is no need for any special routing, as both cloud pods and the EKS control plane can reach the Pod IPs directly.

When running on-premises (and with other CNIs in the cloud), the pods typically run in an isolated overlay network and the CNI takes care of delivering traffic between pods. This is commonly done through encapsulation: the CNI converts pod-to-pod traffic into node-to-node traffic, taking care of encapsulating and de-encapsulating on both ends. This way, there is no need for extra configuration on the nodes and on the routers.

The networking with hybrid nodes is unique because it presents a combination of both topologies - the EKS control plane and cloud nodes (with the VPC CNI) expect a flat network including nodes and pods, while the pods running on hybrid nodes are in an overlay network by using VXLAN for encapsulation (by default in Cilium). Pods running on hybrid nodes can reach the EKS control plane and pods running on cloud nodes assuming the on-premises network can route to the VPC. However, without routing for the pod CIDRs on the on-premises network, any traffic coming back to an on-premises pod IP will be dropped eventually if the network doesn’t know how to reach the overlay network and route to the correct nodes.

# Kubernetes concepts for hybrid nodes
<a name="hybrid-nodes-concepts-kubernetes"></a>

This page details the key Kubernetes concepts that underpin the EKS Hybrid Nodes system architecture.

## EKS control plane in the VPC
<a name="hybrid-nodes-concepts-k8s-api"></a>

The IPs of the EKS control plane ENIs are stored in the `kubernetes` `Endpoints` object in the `default` namespace. When EKS creates new ENIs or removes older ones, EKS updates this object so the list of IPs is always up-to-date.

You can use these endpoints through the `kubernetes` Service, also in the `default` namespace. This service, of `ClusterIP` type, always gets assigned the first IP of the cluster’s service CIDR. For example, for the service CIDR `172.16.0.0/16`, the service IP will be `172.16.0.1`.

Generally, this is how pods (regardless if running in the cloud or hybrid nodes) access the EKS Kubernetes API server. Pods use the service IP as the destination IP, which gets translated to the actual IPs of one of the EKS control plane ENIs. The primary exception is `kube-proxy`, because it sets up the translation.

## EKS API server endpoint
<a name="hybrid-nodes-concepts-k8s-eks-api"></a>

The `kubernetes` service IP isn’t the only way to access the EKS API server. EKS also creates a Route53 DNS name when you create your cluster. This is the `endpoint` field of your EKS cluster when calling the EKS `DescribeCluster` API action.

```
{
    "cluster": {
        "endpoint": "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.gr7.us-west-2.eks.amazonaws.com",
        "name": "my-cluster",
        "status": "ACTIVE"
    }
}
```

In a public endpoint access or public and private endpoint access cluster, your hybrid nodes will resolve this DNS name to a public IP by default, routable through the internet. In a private endpoint access cluster, the DNS name resolves to the private IPs of the EKS control plane ENIs.

This is how the `kubelet` and `kube-proxy` access the Kubernetes API server. If you want all your Kubernetes cluster traffic to flow through the VPC, you either need to configure your cluster in private access mode or modify your on-premises DNS server to resolve the EKS cluster endpoint to the private IPs of the EKS control plane ENIs.

## `kubelet` endpoint
<a name="hybrid-nodes-concepts-k8s-kubelet-api"></a>

The `kubelet` exposes several REST endpoints, allowing other parts of the system to interact with and gather information from each node. In most clusters, the majority of traffic to the `kubelet` server comes from the control plane, but certain monitoring agents might also interact with it.

Through this interface, the `kubelet` handles various requests: fetching logs (`kubectl logs`), executing commands inside containers (`kubectl exec`), and port-forwarding traffic (`kubectl port-forward`). Each of these requests interacts with the underlying container runtime through the `kubelet`, appearing seamless to cluster administrators and developers.

The most common consumer of this API is the Kubernetes API server. When you use any of the `kubectl` commands mentioned previously, `kubectl` makes an API request to the API server, which then calls the `kubelet` API of the node where the pod is running. This is the main reason why the node IP needs to be reachable from the EKS control plane and why, even if your pods are running, you won’t be able to access their logs or `exec` if the node route is misconfigured.

 **Node IPs** 

When the EKS control plane communicates with a node, it uses one of the addresses reported in the `Node` object status (`status.addresses`).

With EKS cloud nodes, it’s common for the kubelet to report the private IP of the EC2 instance as an `InternalIP` during the node registration. This IP is then validated by the Cloud Controller Manager (CCM) making sure it belongs to the EC2 instance. In addition, the CCM typically adds the public IPs (as `ExternalIP`) and DNS names (`InternalDNS` and `ExternalDNS`) of the instance to the node status.

However, there is no CCM for hybrid nodes. When you register a hybrid node with the EKS Hybrid Nodes CLI (`nodeadm`), it configures the kubelet to report your machine’s IP directly in the node’s status, without the CCM.

```
apiVersion: v1
kind: Node
metadata:
  name: my-node-1
spec:
  providerID: eks-hybrid:///us-west-2/my-cluster/my-node-1
status:
  addresses:
  - address: 10.1.1.236
    type: InternalIP
  - address: my-node-1
    type: Hostname
```

If your machine has multiple IPs, the kubelet will select one of them following its own logic. You can control the selected IP with the `--node-ip` flag, which you can pass in `nodeadm` config in `spec.kubelet.flags`. Only the IP reported in the `Node` object needs a route from the VPC. Your machines can have other IPs that aren’t reachable from the cloud.

## `kube-proxy`
<a name="hybrid-nodes-concepts-k8s-kube-proxy"></a>

 `kube-proxy` is responsible for implementing the Service abstraction at the networking layer of each node. It acts as a network proxy and load balancer for traffic destined to Kubernetes Services. By continuously watching the Kubernetes API server for changes related to Services and Endpoints, `kube-proxy` dynamically updates the underlying host’s networking rules to ensure traffic is properly directed.

In `iptables` mode, `kube-proxy` programs several `netfilter` chains to handle service traffic. The rules form the following hierarchy:

1.  **KUBE-SERVICES chain**: The entry point for all service traffic. It has rules matching each service’s `ClusterIP` and port.

1.  **KUBE-SVC-XXX chains**: Service-specific chains has load balancing rules for each service.

1.  **KUBE-SEP-XXX chains**: Endpoint-specific chains has the actual `DNAT` rules.

Let’s examine what happens for a service `test-server` in the `default` namespace: \$1 Service ClusterIP: `172.16.31.14` \$1 Service port: `80` \$1 Backing pods: `10.2.0.110`, `10.2.1.39`, and `10.2.2.254` 

When we inspect the `iptables` rules (using `iptables-save 0 grep -A10 KUBE-SERVICES`):

1. In the **KUBE-SERVICES** chain, we find a rule matching the service:

   ```
   -A KUBE-SERVICES -d 172.16.31.14/32 -p tcp -m comment --comment "default/test-server cluster IP" -m tcp --dport 80 -j KUBE-SVC-XYZABC123456
   ```
   + This rule matches packets destined for 172.16.31.14:80
   + The comment indicates what this rule is for: `default/test-server cluster IP` 
   + Matching packets jump to the `KUBE-SVC-XYZABC123456` chain

1. The **KUBE-SVC-XYZABC123456** chain has probability-based load balancing rules:

   ```
   -A KUBE-SVC-XYZABC123456 -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-POD1XYZABC
   -A KUBE-SVC-XYZABC123456 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-POD2XYZABC
   -A KUBE-SVC-XYZABC123456 -j KUBE-SEP-POD3XYZABC
   ```
   + First rule: 33.3% chance to jump to `KUBE-SEP-POD1XYZABC` 
   + Second rule: 50% chance of the remaining traffic (33.3% of total) to jump to `KUBE-SEP-POD2XYZABC` 
   + Last rule: All remaining traffic (33.3% of total) jumps to `KUBE-SEP-POD3XYZABC` 

1. The individual **KUBE-SEP-XXX** chains perform the DNAT (Destination NAT):

   ```
   -A KUBE-SEP-POD1XYZABC -p tcp -m tcp -j DNAT --to-destination 10.2.0.110:80
   -A KUBE-SEP-POD2XYZABC -p tcp -m tcp -j DNAT --to-destination 10.2.1.39:80
   -A KUBE-SEP-POD3XYZABC -p tcp -m tcp -j DNAT --to-destination 10.2.2.254:80
   ```
   + These DNAT rules rewrite the destination IP and port to direct traffic to specific pods.
   + Each rule handles about 33.3% of the traffic, providing even load balancing between `10.2.0.110`, `10.2.1.39` and `10.2.2.254`.

This multi-level chain structure enables `kube-proxy` to efficiently implement service load balancing and redirection through kernel-level packet manipulation, without requiring a proxy process in the data path.

### Impact on Kubernetes operations
<a name="hybrid-nodes-concepts-k8s-operations"></a>

A broken `kube-proxy` on a node prevents that node from routing Service traffic properly, causing timeouts or failed connections for pods that rely on cluster Services. This can be especially disruptive when a node is first registered. The CNI needs to talk to the Kubernetes API server to get information, such as the node’s pod CIDR, before it can configure any pod networking. To do that, it uses the `kubernetes` Service IP. However, if `kube-proxy` hasn’t been able to start or has failed to set the right `iptables` rules, the requests going to the `kubernetes` service IP aren’t translated to the actual IPs of the EKS control plane ENIs. As a consequence, the CNI will enter a crash loop and none of the pods will be able to run properly.

We know pods use the `kubernetes` service IP to communicate with the Kubernetes API server, but `kube-proxy` needs to first set `iptables` rules to make that work.

How does `kube-proxy` communicate with the API server?

The `kube-proxy` must be configured to use the actual IP/s of the Kubernetes API server or a DNS name that resolves to them. In the case of EKS, EKS configures the default `kube-proxy` to point to the Route53 DNS name that EKS creates when you create the cluster. You can see this value in the `kube-proxy` ConfigMap in the `kube-system` namespace. The content of this ConfigMap is a `kubeconfig` that gets injected into the `kube-proxy` pod, so look for the `clusters0.cluster.server` field. This value will match the `endpoint` field of your EKS cluster (when calling EKS `DescribeCluster` API).

```
apiVersion: v1
data:
  kubeconfig: |-
    kind: Config
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server: https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.gr7.us-west-2.eks.amazonaws.com
      name: default
    contexts:
    - context:
        cluster: default
        namespace: default
        user: default
      name: default
    current-context: default
    users:
    - name: default
      user:
        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
  name: kube-proxy
  namespace: kube-system
```

## Routable remote Pod CIDRs
<a name="hybrid-nodes-concepts-k8s-pod-cidrs"></a>

The [Networking concepts for hybrid nodes](hybrid-nodes-concepts-networking.md) page details the requirements to run webhooks on hybrid nodes or to have pods running on cloud nodes communicate with pods running on hybrid nodes. The key requirement is that the on-premises router needs to know which node is responsible for a particular pod IP. There are several ways to achieve this, including Border Gateway Protocol (BGP), static routes, and Address Resolution Protocol (ARP) proxying. These are covered in the following sections.

 **Border Gateway Protocol (BGP)** 

If your CNI supports it (such as Cilium and Calico), you can use the BGP mode of your CNI to propagate routes to your per node pod CIDRs from your nodes to your local router. When using the CNI’s BGP mode, your CNI acts as a virtual router, so your local router thinks the pod CIDR belongs to a different subnet and your node is the gateway to that subnet.

![\[Hybrid nodes BGP routing\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-nodes-bgp.png)


 **Static routes** 

Or, you can configure static routes in your local router. This is the simplest way to route the on-premises pod CIDR to your VPC, but it is also the most error prone and difficult to maintain. You need to make sure that the routes are always up-to-date with the existing nodes and their assigned pod CIDRs. If your number of nodes is small and infrastructure is static, this is a viable option and removes the need for BGP support in your router. If you opt for this, we recommend to configure your CNI with the pod CIDR slice that you want to assign to each node instead of letting its IPAM decide.

![\[Hybrid nodes static routing\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-nodes-static-routes.png)


 **Address Resolution Protocol (ARP) proxying** 

ARP proxying is another approach to make on-premises pod IPs routable, particularly useful when your hybrid nodes are on the same Layer 2 network as your local router. With ARP proxying enabled, a node responds to ARP requests for pod IPs it hosts, even though those IPs belong to a different subnet.

When a device on your local network tries to reach a pod IP, it first sends an ARP request asking "Who has this IP?". The hybrid node hosting that pod will respond with its own MAC address, saying "I can handle traffic for that IP." This creates a direct path between devices on your local network and the pods without requiring router configuration.

For this to work, your CNI must support proxy ARP functionality. Cilium has built-in support for proxy ARP that you can enable through configuration. The key consideration is that the pod CIDR must not overlap with any other network in your environment, as this could cause routing conflicts.

This approach has several advantages: \$1 No need to configure your router with BGP or maintain static routes \$1 Works well in environments where you don’t have control over your router configuration

![\[Hybrid nodes ARP proxying\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-nodes-arp-proxy.png)


## Pod-to-Pod encapsulation
<a name="hybrid-nodes-concepts-k8s-pod-encapsulation"></a>

In on-premises environments, CNIs typically use encapsulation protocols to create overlay networks that can operate on top of the physical network without the need to re-configure it. This section explains how this encapsulation works. Note that some of the details might vary depending on the CNI you are using.

Encapsulation wraps original pod network packets inside another network packet that can be routed through the underlying physical network. This allows pods to communicate across nodes running the same CNI without requiring the physical network to know how to route those pod CIDRs.

The most common encapsulation protocol used with Kubernetes is Virtual Extensible LAN (VXLAN), though others (such as `Geneve`) are also available depending on your CNI.

### VXLAN encapsulation
<a name="_vxlan_encapsulation"></a>

VXLAN encapsulates Layer 2 Ethernet frames within UDP packets. When a pod sends traffic to another pod on a different node, the CNI performs the following:

1. The CNI intercepts packets from Pod A

1. The CNI wraps the original packet in a VXLAN header

1. This wrapped packet is then sent through the node’s regular networking stack to the destination node

1. The CNI on the destination node unwraps the packet and delivers it to Pod B

Here’s what happens to the packet structure during VXLAN encapsulation:

Original Pod-to-Pod Packet:

```
+-----------------+---------------+-------------+-----------------+
| Ethernet Header | IP Header     | TCP/UDP     | Payload         |
| Src: Pod A MAC  | Src: Pod A IP | Src Port    |                 |
| Dst: Pod B MAC  | Dst: Pod B IP | Dst Port    |                 |
+-----------------+---------------+-------------+-----------------+
```

After VXLAN Encapsulation:

```
+-----------------+-------------+--------------+------------+---------------------------+
| Outer Ethernet  | Outer IP    | Outer UDP    | VXLAN      | Original Pod-to-Pod       |
| Src: Node A MAC | Src: Node A | Src: Random  | VNI: xx    | Packet (unchanged         |
| Dst: Node B MAC | Dst: Node B | Dst: 4789    |            | from above)               |
+-----------------+-------------+--------------+------------+---------------------------+
```

The VXLAN Network Identifier (VNI) distinguishes between different overlay networks.

### Pod communication scenarios
<a name="_pod_communication_scenarios"></a>

 **Pods on the same hybrid node** 

When pods on the same hybrid node communicate, no encapsulation is typically needed. The CNI sets up local routes that direct traffic between pods through the node’s internal virtual interfaces:

```
Pod A -> veth0 -> node's bridge/routing table -> veth1 -> Pod B
```

The packet never leaves the node and doesn’t require encapsulation.

 **Pods on different hybrid nodes** 

Communication between pods on different hybrid nodes requires encapsulation:

```
Pod A -> CNI -> [VXLAN encapsulation] -> Node A network -> router or gateway -> Node B network -> [VXLAN decapsulation] -> CNI -> Pod B
```

This allows the pod traffic to traverse the physical network infrastructure without requiring the physical network to understand pod IP routing.

# Network traffic flows for hybrid nodes
<a name="hybrid-nodes-concepts-traffic-flows"></a>

This page details the network traffic flows for EKS Hybrid Nodes with diagrams showing the end-to-end network paths for the different traffic types.

The following traffic flows are covered:
+  [Hybrid node `kubelet` to EKS control plane](#hybrid-nodes-concepts-traffic-flows-kubelet-to-cp) 
+  [EKS control plane to hybrid node (`kubelet` server)](#hybrid-nodes-concepts-traffic-flows-cp-to-kubelet) 
+  [Pods running on hybrid nodes to EKS control plane](#hybrid-nodes-concepts-traffic-flows-pods-to-cp) 
+  [EKS control plane to pods running on a hybrid node (webhooks)](#hybrid-nodes-concepts-traffic-flows-cp-to-pod) 
+  [Pod-to-Pod running on hybrid nodes](#hybrid-nodes-concepts-traffic-flows-pod-to-pod) 
+  [Pods on cloud nodes to pods on hybrid nodes (east-west traffic)](#hybrid-nodes-concepts-traffic-flows-east-west) 

## Hybrid node `kubelet` to EKS control plane
<a name="hybrid-nodes-concepts-traffic-flows-kubelet-to-cp"></a>

![\[Hybrid node kubelet to EKS control plane\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-nodes-kubelet-to-cp-public.png)


### Request
<a name="_request"></a>

 **1. `kubelet` Initiates Request** 

When the `kubelet` on a hybrid node needs to communicate with the EKS control plane (for example, to report node status or get pod specs), it uses the `kubeconfig` file provided during node registration. This `kubeconfig` has the API server endpoint URL (the Route53 DNS name) rather than direct IP addresses.

The `kubelet` performs a DNS lookup for the endpoint (for example, `https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.gr7.us-west-2.eks.amazonaws.com`). In a public access cluster, this resolves to a public IP address (say `54.239.118.52`) that belongs to the EKS service running in AWS. The `kubelet` then creates a secure HTTPS request to this endpoint. The initial packet looks like this:

```
+--------------------+---------------------+-----------------+
| IP Header          | TCP Header          | Payload         |
| Src: 10.80.0.2     | Src: 52390 (random) |                 |
| Dst: 54.239.118.52 | Dst: 443            |                 |
+--------------------+---------------------+-----------------+
```

 **2. Local Router Routing** 

Since the destination IP is a public IP address and not part of the local network, the `kubelet` sends this packet to its default gateway (the local on-premises router). The router examines the destination IP and determines it’s a public IP address.

For public traffic, the router typically forwards the packet to an internet gateway or border router that handles outbound traffic to the internet. This is omitted in the diagram and will depend on how your on-premises network is setup. The packet traverses your on-premises network infrastructure and eventually reaches your internet service provider’s network.

 **3. Delivery to the EKS control plane** 

The packet travels across the public internet and transit networks until it reaches AWS's network. AWS's network routes the packet to the EKS service endpoint in the appropriate region. When the packet reaches the EKS service, it’s forwarded to the actual EKS control plane for your cluster.

This routing through the public internet is different from the private VPC-routed path that we’ll see in other traffic flows. The key difference is that when using public access mode, traffic from on-premises `kubelet` (although not from pods) to the EKS control plane does not go through your VPC - it uses the global internet infrastructure instead.

### Response
<a name="_response"></a>

After the EKS control plane processes the `kubelet` request, it sends a response back:

 **3. EKS control plane sends response** 

The EKS control plane creates a response packet. This packet has the public IP as the source and the hybrid node’s IP as the destination:

```
+--------------------+---------------------+-----------------+
| IP Header          | TCP Header          | Payload         |
| Src: 54.239.118.52 | Src: 443            |                 |
| Dst: 10.80.0.2     | Dst: 52390          |                 |
+--------------------+---------------------+-----------------+
```

 **2. Internet Routing** 

The response packet travels back through the internet, following the routing path determined by internet service providers, until it reaches your on-premises network edge router.

 **1. Local Delivery** 

Your on-premises router receives the packet and recognizes the destination IP (`10.80.0.2`) as belonging to your local network. It forwards the packet through your local network infrastructure until it reaches the target hybrid node, where the `kubelet` receives and processes the response.

## Hybrid node `kube-proxy` to EKS control plane
<a name="_hybrid_node_kube_proxy_to_eks_control_plane"></a>

If you enable public endpoint access for the cluster, the return traffic uses the public internet. This traffiic originates from the `kube-proxy` on the hybrid node to the EKS control plane and follows the same path as the traffic from the `kubelet` to the EKS control plane.

## EKS control plane to hybrid node (`kubelet` server)
<a name="hybrid-nodes-concepts-traffic-flows-cp-to-kubelet"></a>

![\[EKS control plane to hybrid node\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-nodes-cp-to-kubelet.png)


### Request
<a name="_request_2"></a>

 **1. EKS Kubernetes API server initiates request** 

The EKS Kubernetes API server retrieves the node’s IP address (`10.80.0.2`) from the node object’s status. It then routes this request through its ENI in the VPC, as the destination IP belongs to the configured remote node CIDR (`10.80.0.0/16`). The initial packet looks like this:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.0.0.132 | Src: 67493 (random) |                 |
| Dst: 10.80.0.2  | Dst: 10250          |                 |
+-----------------+---------------------+-----------------+
```

 **2. VPC network processing** 

The packet leaves the ENI and enters the VPC networking layer, where it’s directed to the subnet’s gateway for further routing.

 **3. VPC route table lookup** 

The VPC route table for the subnet containing the EKS control plane ENI has a specific route (the second one in the diagram) for the remote node CIDR. Based on this routing rule, the packet is directed to the VPC-to-onprem gateway.

 **4. Cross-boundary transit** 

The gateway transfers the packet across the cloud boundary through your established connection (such as Direct Connect or VPN) to your on-premises network.

 **5. On-premises network reception** 

The packet arrives at your local on-premises router that handles traffic for the subnet where your hybrid nodes are located.

 **6. Final delivery** 

The local router identifies that the destination IP (`10.80.0.2`) address belongs to its directly connected network and forwards the packet directly to the target hybrid node, where the `kubelet` receives and processes the request.

### Response
<a name="_response_2"></a>

After the hybrid node’s `kubelet` processes the request, it sends back a response following the same path in reverse:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.80.0.2  | Src: 10250          |                 |
| Dst: 10.0.0.132 | Dst: 67493          |                 |
+-----------------+---------------------+-----------------+
```

 **6. `kubelet` Sends Response** 

The `kubelet` on the hybrid node (`10.80.0.2`) creates a response packet with the original source IP as the destination. The destination doesn’t belong to the local network so its sent to the host’s default gateway, which is the local router.

 **5. Local Router Routing** 

The router determines that the destination IP (`10.0.0.132`) belongs to `10.0.0.0/16`, which has a route pointing to the gateway connecting to AWS.

 **4. Cross-Boundary Return** 

The packet travels back through the same on-premises to VPC connection (such as Direct Connect or VPN), crossing the cloud boundary in the reverse direction.

 **3. VPC Routing** 

When the packet arrives in the VPC, the route tables identify that the destination IP belongs to a VPC CIDR. The packet routes within the VPC.

 **2. VPC Network Delivery** 

The VPC networking layer forwards the packet to the subnet with the EKS control plane ENI (`10.0.0.132`).

 **1. ENI Reception** 

The packet reaches the EKS control plane ENI attached to the Kubernetes API server, completing the round trip.

## Pods running on hybrid nodes to EKS control plane
<a name="hybrid-nodes-concepts-traffic-flows-pods-to-cp"></a>

![\[Pods running on hybrid nodes to EKS control plane\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-nodes-pod-to-cp.png)


### Without CNI NAT
<a name="_without_cni_nat"></a>

### Request
<a name="_request_3"></a>

Pods generally talk to the Kubernetes API server through the `kubernetes` service. The service IP is the first IP of the cluster’s service CIDR. This convention allows pods that need to run before CoreDNS is available to reach the API server, for example, the CNI. Requests leave the pod with the service IP as the destination. For example, if the service CIDR is `172.16.0.0/16`, the service IP will be `172.16.0.1`.

 **1. Pod Initiates Request** 

The pod sends a request to the `kubernetes` service IP (`172.16.0.1`) on the API server port (443) from a random source port. The packet looks like this:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.85.1.56 | Src: 67493 (random) |                 |
| Dst: 172.16.0.1 | Dst: 443            |                 |
+-----------------+---------------------+-----------------+
```

 **2. CNI Processing** 

The CNI detects that the destination IP doesn’t belong to any pod CIDR it manages. Since **outgoing NAT is disabled**, the CNI passes the packet to the host network stack without modifying it.

 **3. Node Network Processing** 

The packet enters the node’s network stack where `netfilter` hooks trigger the `iptables` rules set by kube-proxy. Several rules apply in the following order:

1. The packet first hits the `KUBE-SERVICES` chain, which contains rules matching each service’s ClusterIP and port.

1. The matching rule jumps to the `KUBE-SVC-XXX` chain for the `kubernetes` service (packets destined for `172.16.0.1:443`), which contains load balancing rules.

1. The load balancing rule randomly selects one of the `KUBE-SEP-XXX` chains for the control plane ENI IPs (`10.0.0.132` or `10.0.1.23`).

1. The selected `KUBE-SEP-XXX` chain has the actual rule that changes the destination IP from the service IP to the selected IP. This is called Destination Network Address Translation (DNAT).

After these rules are applied, assuming that the selected EKS control plane ENI’s IP is `10.0.0.132`, the packet looks like this:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.85.1.56 | Src: 67493 (random) |                 |
| Dst: 10.0.0.132 | Dst: 443            |                 |
+-----------------+---------------------+-----------------+
```

The node forwards the packet to its default gateway because the destination IP is not in the local network.

 **4. Local Router Routing** 

The local router determines that the destination IP (`10.0.0.132`) belongs to the VPC CIDR (`10.0.0.0/16`) and forwards it to the gateway connecting to AWS.

 **5. Cross-Boundary Transit** 

The packet travels through your established connection (such as Direct Connect or VPN) across the cloud boundary to the VPC.

 **6. VPC Network Delivery** 

The VPC networking layer routes the packet to the correct subnet where the EKS control plane ENI (`10.0.0.132`) is located.

 **7. ENI Reception** 

The packet reaches the EKS control plane ENI attached to the Kubernetes API server.

### Response
<a name="_response_3"></a>

After the EKS control plane processes the request, it sends a response back to the pod:

 **7. API Server Sends Response** 

The EKS Kubernetes API server creates a response packet with the original source IP as the destination. The packet looks like this:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.0.0.132 | Src: 443            |                 |
| Dst: 10.85.1.56 | Dst: 67493          |                 |
+-----------------+---------------------+-----------------+
```

Because the destination IP belongs to the configured remote pod CIDR (`10.85.0.0/16`), it sends it through its ENI in the VPC with the subnet’s router as the next hop.

 **6. VPC Routing** 

The VPC route table contains an entry for the remote pod CIDR (`10.85.0.0/16`), directing this traffic to the VPC-to-onprem gateway.

 **5. Cross-Boundary Transit** 

The gateway transfers the packet across the cloud boundary through your established connection (such as Direct Connect or VPN) to your on-premises network.

 **4. On-Premises Network Reception** 

The packet arrives at your local on-premises router.

 **3. Delivery to node** 

The router’s table has an entry for `10.85.1.0/24` with `10.80.0.2` as the next hop, delivering the packet to our node.

 **2. Node Network Processing** 

As the packet is processed by the node’s network stack, `conntrack` (a part of `netfilter`) matches the packet with the connection the pod initially establish. Since DNAT was originally applied, `conntrack` reverses the DNAT by rewriting the source IP from the EKS control plane ENI’s IP to the `kubernetes` service IP:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 172.16.0.1 | Src: 443            |                 |
| Dst: 10.85.1.56 | Dst: 67493          |                 |
+-----------------+---------------------+-----------------+
```

 **1. CNI Processing** 

The CNI identifies that the destination IP belongs to a pod in its network and delivers the packet to the correct pod network namespace.

This flow showcases why Remote Pod CIDRs must be properly routable from the VPC all the way to the specific node hosting each pod - the entire return path depends on proper routing of pod IPs across both cloud and on-premises networks.

### With CNI NAT
<a name="_with_cni_nat"></a>

This flow is very similar to the one *without CNI NAT*, but with one key difference: the CNI applies source NAT (SNAT) to the packet before sending it to the node’s network stack. This changes the source IP of the packet to the node’s IP, allowing the packet to be routed back to the node without requiring additional routing configuration.

### Request
<a name="_request_4"></a>

 **1. Pod Initiates Request** 

The pod sends a request to the `kubernetes` service IP (`172.16.0.1`) on the EKS Kubernetes API server port (443) from a random source port. The packet looks like this:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.85.1.56 | Src: 67493 (random) |                 |
| Dst: 172.16.0.1 | Dst: 443            |                 |
+-----------------+---------------------+-----------------+
```

 **2. CNI Processing** 

The CNI detects that the destination IP doesn’t belong to any pod CIDR it manages. Since **outgoing NAT is enabled**, the CNI applies SNAT to the packet, changing the source IP to the node’s IP before passing it to the node’s network stack:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.80.0.2  | Src: 67493 (random) |                 |
| Dst: 172.16.0.1 | Dst: 443            |                 |
+-----------------+---------------------+-----------------+
```

Note: CNI and `iptables` are shown in the example as separate blocks for clarity, but in practice, it’s possible that some CNIs use `iptables` to apply NAT.

 **3. Node Network Processing** 

Here the `iptables` rules set by `kube-proxy` behave the same as in the previous example, load balancing the packet to one of the EKS control plane ENIs. The packet now looks like this:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.80.0.2  | Src: 67493 (random) |                 |
| Dst: 10.0.0.132 | Dst: 443            |                 |
+-----------------+---------------------+-----------------+
```

The node forwards the packet to its default gateway because the destination IP is not in the local network.

 **4. Local Router Routing** 

The local router determines that the destination IP (`10.0.0.132`) belongs to the VPC CIDR (`10.0.0.0/16`) and forwards it to the gateway connecting to AWS.

 **5. Cross-Boundary Transit** 

The packet travels through your established connection (such as Direct Connect or VPN) across the cloud boundary to the VPC.

 **6. VPC Network Delivery** 

The VPC networking layer routes the packet to the correct subnet where the EKS control plane ENI (`10.0.0.132`) is located.

 **7. ENI Reception** 

The packet reaches the EKS control plane ENI attached to the Kubernetes API server.

### Response
<a name="_response_4"></a>

After the EKS control plane processes the request, it sends a response back to the pod:

 **7. API Server Sends Response** 

The EKS Kubernetes API server creates a response packet with the original source IP as the destination. The packet looks like this:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.0.0.132 | Src: 443            |                 |
| Dst: 10.80.0.2  | Dst: 67493          |                 |
+-----------------+---------------------+-----------------+
```

Because the destination IP belongs to the configured remote node CIDR (`10.80.0.0/16`), it sends it through its ENI in the VPC with the subnet’s router as the next hop.

 **6. VPC Routing** 

The VPC route table contains an entry for the remote node CIDR (`10.80.0.0/16`), directing this traffic to the VPC-to-onprem gateway.

 **5. Cross-Boundary Transit** 

The gateway transfers the packet across the cloud boundary through your established connection (such as Direct Connect or VPN) to your on-premises network.

 **4. On-Premises Network Reception** 

The packet arrives at your local on-premises router.

 **3. Delivery to node** 

The local router identifies that the destination IP (`10.80.0.2`) address belongs to its directly connected network and forwards the packet directly to the target hybrid node.

 **2. Node Network Processing** 

As the packet is processed by the node’s network stack, `conntrack` (a part of `netfilter`) matches the packet with the connection the pod initially establish and since DNAT was originally applied, it reverses this by rewriting the source IP from the EKS control plane ENI’s IP to the `kubernetes` service IP:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 172.16.0.1 | Src: 443            |                 |
| Dst: 10.80.0.2  | Dst: 67493          |                 |
+-----------------+---------------------+-----------------+
```

 **1. CNI Processing** 

The CNI identifies this packet belongs to a connection where it has previously applied SNAT. It reverses the SNAT, changing the destination IP back to the pod’s IP:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 172.16.0.1 | Src: 443            |                 |
| Dst: 10.85.1.56 | Dst: 67493          |                 |
+-----------------+---------------------+-----------------+
```

The CNI detects the destination IP belongs to a pod in its network and delivers the packet to the correct pod network namespace.

This flow showcases how CNI NAT-ing can simplify configuration by allowing packets to be routed back to the node without requiring additional routing for the pod CIDRs.

## EKS control plane to pods running on a hybrid node (webhooks)
<a name="hybrid-nodes-concepts-traffic-flows-cp-to-pod"></a>

![\[EKS control plane to pods running on a hybrid node\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-nodes-cp-to-pod.png)


This traffic pattern is most commonly seen with webhooks, where the EKS control plane needs to directly initiate connections to webhook servers running in pods on hybrid nodes. Examples include validating and mutating admission webhooks, which are called by the API server during resource validation or mutation processes.

### Request
<a name="_request_5"></a>

 **1. EKS Kubernetes API server initiates request** 

When a webhook is configured in the cluster and a relevant API operation triggers it, the EKS Kubernetes API server needs to make a direct connection to the webhook server pod. The API server first looks up the pod’s IP address from the Service or Endpoint resource associated with the webhook.

Assuming the webhook pod is running on a hybrid node with IP `10.85.1.23`, the EKS Kubernetes API server creates an HTTPS request to the webhook endpoint. The initial packet is sent through the EKS control plane ENI in your VPC because the destination IP `10.85.1.23` belongs to the configured remote pod CIDR (`10.85.0.0/16`). The packet looks like this:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.0.0.132 | Src: 41892 (random) |                 |
| Dst: 10.85.1.23 | Dst: 8443           |                 |
+-----------------+---------------------+-----------------+
```

 **2. VPC Network Processing** 

The packet leaves the EKS control plane ENI and enters the VPC networking layer with the subnet’s router as the next hop.

 **3. VPC Route Table Lookup** 

The VPC route table for the subnet containing the EKS control plane ENI contains a specific route for the remote pod CIDR (`10.85.0.0/16`). This routing rule directs the packet to the VPC-to-onprem gateway (for example, a Virtual Private Gateway for Direct Connect or VPN connections):

```
Destination     Target
10.0.0.0/16     local
10.85.0.0/16    vgw-id (VPC-to-onprem gateway)
```

 **4. Cross-Boundary Transit** 

The gateway transfers the packet across the cloud boundary through your established connection (such as Direct Connect or VPN) to your on-premises network. The packet maintains its original source and destination IP addresses as it traverses this connection.

 **5. On-Premises Network Reception** 

The packet arrives at your local on-premises router. The router consults its routing table to determine how to reach the 10.85.1.23 address. For this to work, your on-premises network must have routes for the pod CIDRs that direct traffic to the appropriate hybrid node.

In this case, the router’s route table contains an entry indicating that the `10.85.1.0/24` subnet is reachable through the hybrid node with IP `10.80.0.2`:

```
Destination     Next Hop
10.85.1.0/24    10.80.0.2
```

 **6. Delivery to node** 

Based on the routing table entry, the router forwards the packet to the hybrid node (`10.80.0.2`). When the packet arrives at the node, it looks the same as when the EKS Kubernetes API server sent it, with the destination IP still being the pod’s IP.

 **7. CNI Processing** 

The node’s network stack receives the packet and, seeing that the destination IP is not the node’s own IP, passes it to the CNI for processing. The CNI identifies that the destination IP belongs to a pod running locally on this node and forwards the packet to the correct pod through the appropriate virtual interfaces:

```
Original packet -> node routing -> CNI -> Pod's network namespace
```

The webhook server in the pod receives the request and processes it.

### Response
<a name="_response_5"></a>

After the webhook pod processes the request, it sends back a response following the same path in reverse:

 **7. Pod Sends Response** 

The webhook pod creates a response packet with its own IP as the source and the original requester (the EKS control plane ENI) as the destination:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.85.1.23 | Src: 8443           |                 |
| Dst: 10.0.0.132 | Dst: 41892          |                 |
+-----------------+---------------------+-----------------+
```

The CNI identifies that this packet goes to an external network (not a local pod) and passes the packet to the node’s network stack with the original source IP preserved.

 **6. Node Network Processing** 

The node determines that the destination IP (`10.0.0.132`) is not in the local network and forwards the packet to its default gateway (the local router).

 **5. Local Router Routing** 

The local router consults its routing table and determines that the destination IP (`10.0.0.132`) belongs to the VPC CIDR (`10.0.0.0/16`). It forwards the packet to the gateway connecting to AWS.

 **4. Cross-Boundary Transit** 

The packet travels back through the same on-premises to VPC connection, crossing the cloud boundary in the reverse direction.

 **3. VPC Routing** 

When the packet arrives in the VPC, the route tables identify that the destination IP belongs to a subnet within the VPC. The packet is routed accordingly within the VPC.

 **2. and 1. EKS control plane ENI Reception** 

The packet reaches the ENI attached to the EKS Kubernetes API server, completing the round trip. The API server receives the webhook response and continues processing the original API request based on this response.

This traffic flow demonstrates why remote pod CIDRs must be properly configured and routed:
+ The VPC must have routes for the remote pod CIDRs pointing to the on-premises gateway
+ Your on-premises network must have routes for pod CIDRs that direct traffic to the specific nodes hosting those pods
+ Without this routing configuration, webhooks and other similar services running in pods on hybrid nodes would not be reachable from the EKS control plane.

## Pod-to-Pod running on hybrid nodes
<a name="hybrid-nodes-concepts-traffic-flows-pod-to-pod"></a>

![\[Pod-to Pod running on hybrid nodes\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-nodes-pod-to-pod.png)


This section explains how pods running on different hybrid nodes communicate with each other. This example assumes your CNI uses VXLAN for encapsulation, which is common for CNIs such as Cilium or Calico. The overall process is similar for other encapsulation protocols such as Geneve or IP-in-IP.

### Request
<a name="_request_6"></a>

 **1. Pod A Initiates Communication** 

Pod A (`10.85.1.56`) on Node 1 wants to send traffic to Pod B (`10.85.2.67`) on Node 2. The initial packet looks like this:

```
+------------------+-----------------+-------------+-----------------+
| Ethernet Header  | IP Header       | TCP/UDP     | Payload         |
| Src: Pod A MAC   | Src: 10.85.1.56 | Src: 43721  |                 |
| Dst: Gateway MAC | Dst: 10.85.2.67 | Dst: 8080   |                 |
+------------------+-----------------+-------------+-----------------+
```

 **2. CNI Intercepts and Processes the Packet** 

When Pod A’s packet leaves its network namespace, the CNI intercepts it. The CNI consults its routing table and determines: - The destination IP (`10.85.2.67`) belongs to the pod CIDR - This IP is not on the local node but belongs to Node 2 (`10.80.0.3`) - The packet needs to be encapsulated with VXLAN.

The decision to encapsulate is critical because the underlying physical network doesn’t know how to route pod CIDRs directly - it only knows how to route traffic between node IPs.

The CNI encapsulates the entire original packet inside a VXLAN frame. This effectively creates a "packet within a packet" with new headers:

```
+-----------------+----------------+--------------+------------+---------------------------+
| Outer Ethernet  | Outer IP       | Outer UDP    | VXLAN      | Original Pod-to-Pod       |
| Src: Node1 MAC  | Src: 10.80.0.2 | Src: Random  | VNI: 42    | Packet (unchanged         |
| Dst: Router MAC | Dst: 10.80.0.3 | Dst: 8472    |            | from above)               |
+-----------------+----------------+--------------+------------+---------------------------+
```

Key points about this encapsulation: - The outer packet is addressed from Node 1 (`10.80.0.2`) to Node 2 (`10.80.0.3`) - UDP port `8472` is the VXLAN port Cilium uses by default - The VXLAN Network Identifier (VNI) identifies which overlay network this packet belongs to - The entire original packet (with Pod A’s IP as source and Pod B’s IP as destination) is preserved intact inside

The encapsulated packet now enters the regular networking stack of Node 1 and is processed in the same way as any other packet:

1.  **Node Network Processing**: Node 1’s network stack routes the packet based on its destination (`10.80.0.3`)

1.  **Local Network Delivery**:
   + If both nodes are on the same Layer 2 network, the packet is sent directly to Node 2
   + If they’re on different subnets, the packet is forwarded to the local router first

1.  **Router Handling**: The router forwards the packet based on its routing table, delivering it to Node 2

 **3. Receiving Node Processing** 

When the encapsulated packet arrives at Node 2 (`10.80.0.3`):

1. The node’s network stack receives it and identifies it as a VXLAN packet (UDP port `4789`)

1. The packet is passed to the CNI’s VXLAN interface for processing

 **4. VXLAN Decapsulation** 

The CNI on Node 2 processes the VXLAN packet:

1. It strips away the outer headers (Ethernet, IP, UDP, and VXLAN)

1. It extracts the original inner packet

1. The packet is now back to its original form:

```
+------------------+-----------------+-------------+-----------------+
| Ethernet Header  | IP Header       | TCP/UDP     | Payload         |
| Src: Pod A MAC   | Src: 10.85.1.56 | Src: 43721  |                 |
| Dst: Gateway MAC | Dst: 10.85.2.67 | Dst: 8080   |                 |
+------------------+-----------------+-------------+-----------------+
```

The CNI on Node 2 examines the destination IP (`10.85.2.67`) and:

1. Identifies that this IP belongs to a local pod

1. Routes the packet through the appropriate virtual interfaces

1. Delivers the packet to Pod B’s network namespace

### Response
<a name="_response_6"></a>

When Pod B responds to Pod A, the entire process happens in reverse:

1. Pod B sends a packet to Pod A (`10.85.1.56`)

1. Node 2’s CNI encapsulates it with VXLAN, setting the destination to Node 1 (`10.80.0.2`)

1. The encapsulated packet is delivered to Node 1

1. Node 1’s CNI decapsulates it and delivers the original response to Pod A

## Pods on cloud nodes to pods on hybrid nodes (east-west traffic)
<a name="hybrid-nodes-concepts-traffic-flows-east-west"></a>

![\[Pods on cloud nodes to pods on hybrid nodes\]](http://docs.aws.amazon.com/eks/latest/userguide/images/hybrid-nodes-east-west.png)


### Request
<a name="_request_7"></a>

 **1. Pod A Initiates Communication** 

Pod A (`10.0.0.56`) on the EC2 Node wants to send traffic to Pod B (`10.85.1.56`) on the Hybrid Node. The initial packet looks like this:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.0.0.56  | Src: 52390 (random) |                 |
| Dst: 10.85.1.56 | Dst: 8080           |                 |
+-----------------+---------------------+-----------------+
```

With the VPC CNI, Pod A has an IP from the VPC CIDR and is directly attached to an ENI on the EC2 instance. The pod’s network namespace is connected to the VPC network, so the packet enters the VPC routing infrastructure directly.

 **2. VPC Routing** 

The VPC route table contains a specific route for the Remote Pod CIDR (`10.85.0.0/16`), directing this traffic to the VPC-to-onprem gateway:

```
Destination     Target
10.0.0.0/16     local
10.85.0.0/16    vgw-id (VPC-to-onprem gateway)
```

Based on this routing rule, the packet is directed toward the gateway connecting to your on-premises network.

 **3. Cross-Boundary Transit** 

The gateway transfers the packet across the cloud boundary through your established connection (such as Direct Connect or VPN) to your on-premises network. The packet maintains its original source and destination IP addresses throughout this transit.

 **4. On-Premises Network Reception** 

The packet arrives at your local on-premises router. The router consults its routing table to determine the next hop for reaching the 10.85.1.56 address. Your on-premises router must have routes for the pod CIDRs that direct traffic to the appropriate hybrid node.

The router’s table has an entry indicating that the `10.85.1.0/24` subnet is reachable through the hybrid node with IP `10.80.0.2`:

```
Destination     Next Hop
10.85.1.0/24    10.80.0.2
```

 **5. Node Network Processing** 

The router forwards the packet to the hybrid node (`10.80.0.2`). When the packet arrives at the node, it still has Pod A’s IP as the source and Pod B’s IP as the destination.

 **6. CNI Processing** 

The node’s network stack receives the packet and, seeing that the destination IP is not its own, passes it to the CNI for processing. The CNI identifies that the destination IP belongs to a pod running locally on this node and forwards the packet to the correct pod through the appropriate virtual interfaces:

```
Original packet -> node routing -> CNI -> Pod B's network namespace
```

Pod B receives the packet and processes it as needed.

### Response
<a name="_response_7"></a>

 **6. Pod B Sends Response** 

Pod B creates a response packet with its own IP as the source and Pod A’s IP as the destination:

```
+-----------------+---------------------+-----------------+
| IP Header       | TCP Header          | Payload         |
| Src: 10.85.1.56 | Src: 8080           |                 |
| Dst: 10.0.0.56  | Dst: 52390          |                 |
+-----------------+---------------------+-----------------+
```

The CNI identifies that this packet is destined for an external network and passes it to the node’s network stack.

 **5. Node Network Processing** 

The node determines that the destination IP (`10.0.0.56`) does not belong to the local network and forwards the packet to its default gateway (the local router).

 **4. Local Router Routing** 

The local router consults its routing table and determines that the destination IP (`10.0.0.56`) belongs to the VPC CIDR (`10.0.0.0/16`). It forwards the packet to the gateway connecting to AWS.

 **3. Cross-Boundary Transit** 

The packet travels back through the same on-premises to VPC connection, crossing the cloud boundary in the reverse direction.

 **2. VPC Routing** 

When the packet arrives in the VPC, the routing system identifies that the destination IP belongs to a subnet within the VPC. The packet is routed through the VPC network toward the EC2 instance hosting Pod A.

 **1. Pod A Receives Response** 

The packet arrives at the EC2 instance and is delivered directly to Pod A through its attached ENI. Since the VPC CNI doesn’t use overlay networking for pods in the VPC, no additional decapsulation is needed - the packet arrives with its original headers intact.

This east-west traffic flow demonstrates why remote pod CIDRs must be properly configured and routable from both directions:
+ The VPC must have routes for the remote pod CIDRs pointing to the on-premises gateway
+ Your on-premises network must have routes for pod CIDRs that direct traffic to the specific nodes hosting those pods.

# Hybrid nodes `nodeadm` reference
<a name="hybrid-nodes-nodeadm"></a>

The Amazon EKS Hybrid Nodes CLI (`nodeadm`) simplifies the installation, configuration, registration, and uninstallation of the hybrid nodes components. You can include `nodeadm` in your operating system images to automate hybrid node bootstrap, see [Prepare operating system for hybrid nodes](hybrid-nodes-os.md) for more information.

The `nodeadm` version for hybrid nodes differs from the `nodeadm` version used for bootstrapping Amazon EC2 instances as nodes in Amazon EKS clusters. Follow the documentation and references for the appropriate `nodeadm` version. This documentation page is for the hybrid nodes `nodeadm` version.

The source code for the hybrid nodes `nodeadm` is published in the https://github.com/aws/eks-hybrid GitHub repository.

**Important**  
You must run `nodeadm` with a user that has root/sudo privileges.

## Download `nodeadm`
<a name="_download_nodeadm"></a>

The hybrid nodes version of `nodeadm` is hosted in Amazon S3 fronted by Amazon CloudFront. To install `nodeadm` on each on-premises host, you can run the following command from your on-premises hosts.

 **For x86\$164 hosts** 

```
curl -OL 'https://hybrid-assets.eks.amazonaws.com/releases/latest/bin/linux/amd64/nodeadm'
```

 **For ARM hosts** 

```
curl -OL 'https://hybrid-assets.eks.amazonaws.com/releases/latest/bin/linux/arm64/nodeadm'
```

Add executable file permission to the downloaded binary on each host.

```
chmod +x nodeadm
```

## `nodeadm install`
<a name="_nodeadm_install"></a>

The `nodeadm install` command is used to install the artifacts and dependencies required to run and join hybrid nodes to an Amazon EKS cluster. The `nodeadm install` command can be run individually on each hybrid node or can be run during image build pipelines to preinstall the hybrid nodes dependencies in operating system images.

 **Usage** 

```
nodeadm install [KUBERNETES_VERSION] [flags]
```

 **Positional Arguments** 

(Required) `KUBERNETES_VERSION` The major.minor version of EKS Kubernetes to install, for example `1.32` 

 **Flags** 


| Name | Required | Description | 
| --- | --- | --- | 
|   `-p`,  `--credential-provider`   |  TRUE  |  Credential provider to install. Supported values are `iam-ra` and `ssm`. See [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md) for more information.  | 
|   `-s`,  `--containerd-source`   |  FALSE  |  Source for `containerd`. `nodeadm` supports installing `containerd` from the OS distro, Docker packages, and skipping `containerd` install.  **Values**   `distro` - This is the default value. `nodeadm` will install the latest `containerd` package distributed by the node OS that is compatible with the EKS Kubernetes version. `distro` is not a supported value for Red Hat Enterprise Linux (RHEL) operating systems.  `docker` - `nodeadm` will install the latest `containerd` package built and distributed by Docker that is compatible with the EKS Kubernetes version. `docker` is not a supported value for Amazon Linux 2023.  `none` - `nodeadm` will not install `containerd` package. You must manually install `containerd` before running `nodeadm init`.  | 
|   `-r`,  `--region`   |  FALSE  |  Specifies the AWS Region for downloading artifacts such as the SSM Agent. Defaults to `us-west-2`.  | 
|   `-t`,  `--timeout`   |  FALSE  |  Maximum install command duration. The input follows duration format. For example `1h23m`. Default download timeout for install command is set to 20 minutes.  | 
|   `-h`, `--help`   |  FALSE  |  Displays help message with available flag, subcommand and positional value parameters.  | 

 **Examples** 

Install Kubernetes version `1.32` with AWS Systems Manager (SSM) as the credential provider

```
nodeadm install 1.32 --credential-provider ssm
```

Install Kubernetes version `1.32` with AWS Systems Manager (SSM) as the credential provider, Docker as the containerd source, with a download timeout of 20 minutes.

```
nodeadm install 1.32 --credential-provider ssm --containerd-source docker --timeout 20m
```

Install Kubernetes version `1.32` with AWS IAM Roles Anywhere as the credential provider

```
nodeadm install 1.32 --credential-provider iam-ra
```

## `nodeadm config check`
<a name="_nodeadm_config_check"></a>

The `nodeadm config check` command checks the provided node configuration for errors. This command can be used to verify and validate the correctness of a hybrid node configuration file.

 **Usage** 

```
nodeadm config check [flags]
```

 **Flags** 


| Name | Required | Description | 
| --- | --- | --- | 
|   `-c`,  `--config-source`   |  TRUE  |  Source of nodeadm configuration. For hybrid nodes the input should follow a URI with file scheme.  | 
|   `-h`, `--help`   |  FALSE  |  Displays help message with available flag, subcommand and positional value parameters.  | 

 **Examples** 

```
nodeadm config check -c file://nodeConfig.yaml
```

## `nodeadm init`
<a name="_nodeadm_init"></a>

The `nodeadm init` command starts and connects the hybrid node with the configured Amazon EKS cluster. See [Node Config for SSM hybrid activations](#hybrid-nodes-node-config-ssm) or [Node Config for IAM Roles Anywhere](#hybrid-nodes-node-config-iamra) for details of how to configure the `nodeConfig.yaml` file.

 **Usage** 

```
nodeadm init [flags]
```

 **Flags** 


| Name | Required | Description | 
| --- | --- | --- | 
|   `-c`,  `--config-source`   |  TRUE  |  Source of `nodeadm` configuration. For hybrid nodes the input should follow a URI with file scheme.  | 
|   `-s`,  `--skip`   |  FALSE  |  Phases of `init` to be skipped. It is not recommended to skip any of the phases unless it helps to fix an issue.  **Values**   `install-validation` skips checking if the preceding install command ran successfully.  `cni-validation` skips checking if either Cilium or Calico CNI’s VXLAN ports are opened if firewall is enabled on the node  `node-ip-validation` skips checking if the node IP falls within a CIDR in the remote node networks  | 
|   `-h`, `--help`   |  FALSE  |  Displays help message with available flag, subcommand and positional value parameters.  | 

 **Examples** 

```
nodeadm init -c file://nodeConfig.yaml
```

## `nodeadm upgrade`
<a name="_nodeadm_upgrade"></a>

The `nodeadm upgrade` command upgrades all the installed artifacts to the latest version and bootstraps the node to configure the upgraded artifacts and join the EKS cluster on AWS. Upgrade is a disruptive command to the workloads running on the node. Please move your workloads to another node before running upgrade.

 **Usage** 

```
nodeadm upgrade [KUBERNETES_VERSION] [flags]
```

 **Positional Arguments** 

(Required) `KUBERNETES_VERSION` The major.minor version of EKS Kubernetes to install, for example `1.32` 

 **Flags** 


| Name | Required | Description | 
| --- | --- | --- | 
|   `-c`,  `--config-source`   |  TRUE  |  Source of `nodeadm` configuration. For hybrid nodes the input should follow a URI with file scheme.  | 
|   `-t`,  `--timeout`   |  FALSE  |  Timeout for downloading artifacts. The input follows duration format. For example 1h23m. Default download timeout for upgrade command is set to 10 minutes.  | 
|   `-s`,  `--skip`   |  FALSE  |  Phases of upgrade to be skipped. It is not recommended to skip any of the phase unless it helps to fix an issue.  **Values**   `pod-validation` skips checking if all the no pods are running on the node, except daemon sets and static pods.  `node-validation` skips checking if the node has been cordoned.  `init-validation` skips checking if the node has been initialized successfully before running upgrade.  `containerd-major-version-upgrade` prevents containerd major version upgrades during node upgrade.  | 
|   `-h`, `--help`   |  FALSE  |  Displays help message with available flag, subcommand and positional value parameters.  | 

 **Examples** 

```
nodeadm upgrade 1.32 -c file://nodeConfig.yaml
```

```
nodeadm upgrade 1.32 -c file://nodeConfig.yaml --timeout 20m
```

## `nodeadm uninstall`
<a name="_nodeadm_uninstall"></a>

The `nodeadm uninstall` command stops and removes the artifacts `nodeadm` installs during `nodeadm install`, including the kubelet and containerd. Note, the uninstall command does not drain or delete your hybrid nodes from your cluster. You must run the drain and delete operations separately, see [Remove hybrid nodes](hybrid-nodes-remove.md) for more information. By default, `nodeadm uninstall` will not proceed if there are pods remaining on the node. Similarly, `nodeadm uninstall` does not remove CNI dependencies or dependencies of other Kubernetes add-ons you run on your cluster. To fully remove the CNI installation from your host, see the instructions at [Configure CNI for hybrid nodes](hybrid-nodes-cni.md). If you are using AWS SSM hybrid activations as your on-premises credentials provider, the `nodeadm uninstall` command deregisters your hosts as AWS SSM managed instances.

 **Usage** 

```
nodeadm uninstall [flags]
```

 **Flags** 


| Name | Required | Description | 
| --- | --- | --- | 
|   `-s`,  `--skip`   |  FALSE  |  Phases of uninstall to be skipped. It is not recommended to skip any of the phases unless it helps to fix an issue.  **Values**   `pod-validation` skips checking if all the no pods are running on the node, except daemon sets and static pods.  `node-validation` skips checking if the node has been cordoned.  `init-validation` skips checking if the node has been initialized successfully before running uninstall.  | 
|   `-h`,  `--help`   |  FALSE  |  Displays help message with available flag, subcommand and positional value parameters.  | 
|   `-f`,  `--force`   |  FALSE  |  Force delete additional directories that might contain remaining files from Kubernetes and CNI components.  **WARNING**  This will delete all contents in default Kubernetes and CNI directories (`/var/lib/cni`, `/etc/cni/net.d`, etc). Do not use this flag if you store your own data in these locations. Starting from nodeadm `v1.0.9`, the `./nodeadm uninstall --skip node-validation,pod-validation --force` command no longer deletes the `/var/lib/kubelet` directory. This is because it may contain Pod volumes and volume-subpath directories that sometimes include the mounted node filesystem.  **Safe handling tips**  - Deleting mounted paths can lead to accidental deletion of the actual mounted node filesystem. Before manually deleting the `/var/lib/kubelet` directory, carefully inspect all active mounts and unmount volumes safely to avoid data loss.  | 

 **Examples** 

```
nodeadm uninstall
```

```
nodeadm uninstall --skip node-validation,pod-validation
```

## `nodeadm debug`
<a name="_nodeadm_debug"></a>

The `nodeadm debug` command can be used to troubleshoot unhealthy or misconfigured hybrid nodes. It validates the following requirements are in-place.
+ The node has network access to the required AWS APIs for obtaining credentials,
+ The node is able to get AWS credentials for the configured Hybrid Nodes IAM role,
+ The node has network access to the EKS Kubernetes API endpoint and the validity of the EKS Kubernetes API endpoint certificate,
+ The node is able to authenticate with the EKS cluster, its identity in the cluster is valid, and that the node has access to the EKS cluster through the VPC configured for the EKS cluster.

If errors are found, the command’s output suggests troubleshooting steps. Certain validation steps show child processes. If these fail, the output is showed in a stderr section under the validation error.

 **Usage** 

```
nodeadm debug [flags]
```

 **Flags** 


| Name | Required | Description | 
| --- | --- | --- | 
|   `-c`, `--config-source`   |  TRUE  |  Source of `nodeadm` configuration. For hybrid nodes the input should follow a URI with file scheme.  | 
|   `--no-color`   |  FALSE  |  Disables color output. Useful for automation.  | 
|   `-h`, `--help`   |  FALSE  |  Displays help message with available flag, subcommand and positional value parameters.  | 

 **Examples** 

```
nodeadm debug -c file://nodeConfig.yaml
```

## Nodeadm file locations
<a name="_nodeadm_file_locations"></a>

### nodeadm install
<a name="_nodeadm_install_2"></a>

When running `nodeadm install`, the following files and file locations are configured.


| Artifact | Path | 
| --- | --- | 
|  IAM Roles Anywhere CLI  |  /usr/local/bin/aws\$1signing\$1helper  | 
|  Kubelet binary  |  /usr/bin/kubelet  | 
|  Kubectl binary  |  usr/local/bin/kubectl  | 
|  ECR Credentials Provider  |  /etc/eks/image-credential-provider/ecr-credential-provider  | 
|   AWS IAM Authenticator  |  /usr/local/bin/aws-iam-authenticator  | 
|  SSM Setup CLI  |  /opt/ssm/ssm-setup-cli  | 
|  SSM Agent  |  On Ubuntu - /snap/amazon-ssm-agent/current/amazon-ssm-agent On RHEL & AL2023 - /usr/bin/amazon-ssm-agent  | 
|  Containerd  |  On Ubuntu & AL2023 - /usr/bin/containerd On RHEL - /bin/containerd  | 
|  Iptables  |  On Ubuntu & AL2023 - /usr/sbin/iptables On RHEL - /sbin/iptables  | 
|  CNI plugins  |  /opt/cni/bin  | 
|  installed artifacts tracker  |  /opt/nodeadm/tracker  | 

### nodeadm init
<a name="_nodeadm_init_2"></a>

When running `nodeadm init`, the following files and file locations are configured.


| Name | Path | 
| --- | --- | 
|  Kubelet kubeconfig  |  /var/lib/kubelet/kubeconfig  | 
|  Kubelet config  |  /etc/kubernetes/kubelet/config.json  | 
|  Kubelet systemd unit  |  /etc/systemd/system/kubelet.service  | 
|  Image credentials provider config  |  /etc/eks/image-credential-provider/config.json  | 
|  Kubelet env file  |  /etc/eks/kubelet/environment  | 
|  Kubelet Certs  |  /etc/kubernetes/pki/ca.crt  | 
|  Containerd config  |  /etc/containerd/config.toml  | 
|  Containerd kernel modules config  |  /etc/modules-load.d/containerd.conf  | 
|   AWS config file  |  /etc/aws/hybrid/config  | 
|   AWS credentials file (if enable credentials file)  |  /eks-hybrid/.aws/credentials  | 
|   AWS signing helper system unit  |  /etc/systemd/system/aws\$1signing\$1helper\$1update.service  | 
|  Sysctl conf file  |  /etc/sysctl.d/99-nodeadm.conf  | 
|  Ca-certificates  |  /etc/ssl/certs/ca-certificates.crt  | 
|  Gpg key file  |  /etc/apt/keyrings/docker.asc  | 
|  Docker repo source file  |  /etc/apt/sources.list.d/docker.list  | 

## Node Config for SSM hybrid activations
<a name="hybrid-nodes-node-config-ssm"></a>

The following is a sample `nodeConfig.yaml` when using AWS SSM hybrid activations for hybrid nodes credentials.

```
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name:             # Name of the EKS cluster
    region:           # AWS Region where the EKS cluster resides
  hybrid:
    ssm:
      activationCode: # SSM hybrid activation code
      activationId:   # SSM hybrid activation id
```

## Node Config for IAM Roles Anywhere
<a name="hybrid-nodes-node-config-iamra"></a>

The following is a sample `nodeConfig.yaml` for AWS IAM Roles Anywhere for hybrid nodes credentials.

When using AWS IAM Roles Anywhere as your on-premises credentials provider, the `nodeName` you use in your `nodeadm` configuration must align with the permissions you scoped for your Hybrid Nodes IAM role. For example, if your permissions for the Hybrid Nodes IAM role only allow AWS IAM Roles Anywhere to assume the role when the role session name is equal to the CN of the host certificate, then the `nodeName` in your `nodeadm` configuration must be the same as the CN of your certificates. The `nodeName` that you use can’t be longer than 64 characters. For more information, see [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md).

```
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name:              # Name of the EKS cluster
    region:            # AWS Region where the EKS cluster resides
  hybrid:
    iamRolesAnywhere:
      nodeName:        # Name of the node
      trustAnchorArn:  # ARN of the IAM Roles Anywhere trust anchor
      profileArn:      # ARN of the IAM Roles Anywhere profile
      roleArn:         # ARN of the Hybrid Nodes IAM role
      certificatePath: # Path to the certificate file to authenticate with the IAM Roles Anywhere trust anchor
      privateKeyPath:  # Path to the private key file for the certificate
```

## Node Config for customizing kubelet (Optional)
<a name="hybrid-nodes-nodeadm-kubelet"></a>

You can pass kubelet configuration and flags in your `nodeadm` configuration. See the example below for how to add an additional node label `abc.amazonaws.com/test-label` and config for setting `shutdownGracePeriod` to 30 seconds.

```
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name:             # Name of the EKS cluster
    region:           # AWS Region where the EKS cluster resides
  kubelet:
    config:           # Map of kubelet config and values
       shutdownGracePeriod: 30s
    flags:            # List of kubelet flags
       - --node-labels=abc.company.com/test-label=true
  hybrid:
    ssm:
      activationCode: # SSM hybrid activation code
      activationId:   # SSM hybrid activation id
```

## Node Config for customizing containerd (Optional)
<a name="_node_config_for_customizing_containerd_optional"></a>

You can pass custom containerd configuration in your `nodeadm` configuration. The containerd configuration for `nodeadm` accepts in-line TOML. See the example below for how to configure containerd to disable deletion of unpacked image layers in the containerd content store.

```
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name:             # Name of the EKS cluster
    region:           # AWS Region where the EKS cluster resides
  containerd:
    config: |         # Inline TOML containerd additional configuration
       [plugins."io.containerd.grpc.v1.cri".containerd]
       discard_unpacked_layers = false
  hybrid:
    ssm:
      activationCode: # SSM hybrid activation code
      activationId:   # SSM hybrid activation id
```

**Note**  
Containerd versions 1.x and 2.x use different configuration formats. Containerd 1.x uses config version 2, while containerd 2.x uses config version 3. Although containerd 2.x remains backward compatible with config version 2, config version 3 is recommended for optimal performance. Check your containerd version with `containerd --version` or review `nodeadm` install logs. For more details on config versioning, see https://containerd.io/releases/

You can also use the containerd configuration to enable SELinux support. With SELinux enabled on containerd, ensure pods scheduled on the node have the proper securityContext and seLinuxOptions enabled. More information on configuring a security context can be found on the [Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/).

**Note**  
Red Hat Enterprise Linux (RHEL) 8 and RHEL 9 have SELinux enabled by default and set to strict on the host. Amazon Linux 2023 has SELinux enabled by default and set to permissive mode. When SELinux is set to permissive mode on the host, enabling it on containerd will not block requests but will log it according to the SELinux configuration on the host.

```
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name:             # Name of the EKS cluster
    region:           # AWS Region where the EKS cluster resides
  containerd:
    config: |         # Inline TOML containerd additional configuration
       [plugins."io.containerd.grpc.v1.cri"]
       enable_selinux = true
  hybrid:
    ssm:
      activationCode: # SSM hybrid activation code
      activationId:   # SSM hybrid activation id
```

# Troubleshooting hybrid nodes
<a name="hybrid-nodes-troubleshooting"></a>

This topic covers some common errors that you might see while using Amazon EKS Hybrid Nodes and how to fix them. For other troubleshooting information, see [Troubleshoot problems with Amazon EKS clusters and nodes](troubleshooting.md) and [Knowledge Center tag for Amazon EKS](https://repost.aws/tags/knowledge-center/TA4IvCeWI1TE66q4jEj4Z9zg/amazon-elastic-kubernetes-service) on * AWS re:Post*. If you cannot resolve the issue, contact AWS Support.

 **Node troubleshooting with `nodeadm debug` ** You can run the `nodeadm debug` command from your hybrid nodes to validate networking and credential requirements are met. For more information on the `nodeadm debug` command, see [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md).

 **Detect issues with your hybrid nodes with cluster insights** Amazon EKS cluster insights includes *insight checks* that detect common issues with the configuration of EKS Hybrid Nodes in your cluster. You can view the results of all insight checks from the AWS Management Console, AWS CLI, and the AWS SDKs. For more information about cluster insights, see [Prepare for Kubernetes version upgrades and troubleshoot misconfigurations with cluster insights](cluster-insights.md).

## Installing hybrid nodes troubleshooting
<a name="hybrid-nodes-troubleshooting-install"></a>

The following troubleshooting topics are related to installing the hybrid nodes dependencies on hosts with the `nodeadm install` command.

 ** `nodeadm` command failed `must run as root` ** 

The `nodeadm install` command must be run with a user that has root or `sudo` privileges on your host. If you run `nodeadm install` with a user that does not have root or `sudo` privileges, you will see the following error in the `nodeadm` output.

```
"msg":"Command failed","error":"must run as root"
```

 **Unable to connect to dependencies** 

The `nodeadm install` command installs the dependencies required for hybrid nodes. The hybrid nodes dependencies include `containerd`, `kubelet`, `kubectl`, and AWS SSM or AWS IAM Roles Anywhere components. You must have access from where you are running `nodeadm install` to download these dependencies. For more information on the list of locations that you must be able to access, see [Prepare networking for hybrid nodes](hybrid-nodes-networking.md). If you do not have access, you will see errors similar to the following in the `nodeadm install` output.

```
"msg":"Command failed","error":"failed reading file from url: ...: max retries achieved for http request"
```

 **Failed to update package manager** 

The `nodeadm install` command runs `apt update` or `yum update` or `dnf update` before installing the hybrid nodes dependencies. If this step does not succeed you might see errors similar to the following. To remediate, you can run `apt update` or `yum update` or `dnf update` before running `nodeadm install` or you can attempt to re-run `nodeadm install`.

```
failed to run update using package manager
```

 **Timeout or context deadline exceeded** 

When running `nodeadm install`, if you see issues at various stages of the install process with errors that indicate there was a timeout or context deadline exceeded, you might have a slow connection that is preventing the installation of the hybrid nodes dependencies before timeouts are met. To work around these issues, you can attempt to use the `--timeout` flag in `nodeadm` to extend the duration of the timeouts for downloading the dependencies.

```
nodeadm install K8S_VERSION --credential-provider CREDS_PROVIDER --timeout 20m0s
```

## Connecting hybrid nodes troubleshooting
<a name="hybrid-nodes-troubleshooting-connect"></a>

The troubleshooting topics in this section are related to the process of connecting hybrid nodes to EKS clusters with the `nodeadm init` command.

 **Operation errors or unsupported scheme** 

When running `nodeadm init`, if you see errors related to `operation error` or `unsupported scheme`, check your `nodeConfig.yaml` to make sure it is properly formatted and passed to `nodeadm`. For more information on the format and options for `nodeConfig.yaml`, see [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md).

```
"msg":"Command failed","error":"operation error ec2imds: GetRegion, request canceled, context deadline exceeded"
```

 **Hybrid Nodes IAM role missing permissions for the `eks:DescribeCluster` action** 

When running `nodeadm init`, `nodeadm` attempts to gather information about your EKS cluster by calling the EKS `DescribeCluster` action. If your Hybrid Nodes IAM role does not have permission for the `eks:DescribeCluster` action, then you must pass your Kubernetes API endpoint, cluster CA bundle, and service IPv4 CIDR in the node configuration you pass to `nodeadm` when you run `nodeadm init`. For more information on the required permissions for the Hybrid Nodes IAM role, see [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md).

```
"msg":"Command failed","error":"operation error EKS: DescribeCluster, https response error StatusCode: 403 ... AccessDeniedException"
```

 **Hybrid Nodes IAM role missing permissions for the `eks:ListAccessEntries` action** 

When running `nodeadm init`, `nodeadm` attempts to validate whether your EKS cluster has an access entry of type `HYBRID_LINUX` associated with the Hybrid Nodes IAM role by calling the EKS `ListAccessEntries` action. If your Hybrid Nodes IAM role does not have permission for the `eks:ListAccessEntries` action, then you must pass the `--skip cluster-access-validation` flag when you run the `nodeadm init` command. For more information on the required permissions for the Hybrid Nodes IAM role, see [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md).

```
"msg":"Command failed","error":"operation error EKS: ListAccessEntries, https response error StatusCode: 403 ... AccessDeniedException"
```

 **Node IP not in remote node network CIDR** 

When running `nodeadm init`, you might encounter an error if the node’s IP address is not within the specified remote node network CIDRs. The error will look similar to the following example:

```
node IP 10.18.0.1 is not in any of the remote network CIDR blocks [10.0.0.0/16 192.168.0.0/16]
```

This example shows a node with IP 10.18.0.1 attempting to join a cluster with remote network CIDRs 10.0.0.0/16 and 192.168.0.0/16. The error occurs because 10.18.0.1 isn’t within either of the ranges.

Confirm that you’ve properly configured your `RemoteNodeNetworks` to include all node IP addresses. For more information on networking configuration, see [Prepare networking for hybrid nodes](hybrid-nodes-networking.md).
+ Run the following command in the region your cluster is located to check your `RemoteNodeNetwork` configurations. Verify that the CIDR blocks listed in the output include the IP range of your node and is the same as the CIDR blocks listed in the error message. If they do not match, confirm the cluster name and region in your `nodeConfig.yaml` match your intended cluster.

```
aws eks describe-cluster --name CLUSTER_NAME --region REGION_NAME --query cluster.remoteNetworkConfig.remoteNodeNetworks
```
+ Verify you’re working with the intended node:
  + Confirm you’re on the correct node by checking its hostname and IP address match the one you intend to register with the cluster.
  + Confirm this node is in the correct on-premises network (the one whose CIDR range was registered as `RemoteNodeNetwork` during cluster setup).

If your node IP is still not what you expected, check the following:
+ If you are using IAM Roles Anywhere, `kubelet` performs a DNS lookup on the IAM Roles Anywhere `nodeName` and uses an IP associated with the node name if available. If you maintain DNS entries for your nodes, confirm that these entries point to IPs within your remote node network CIDRs.
+ If your node has multiple network interfaces, `kubelet` might select an interface with an IP address outside your remote node network CIDRs as default. To use a different interface, specify its IP address using the `--node-ip` `kubelet` flag in your `nodeConfig.yaml`. For more information, see [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md). You can view your node’s network interfaces and its IP addresses by running the following command on your node:

```
ip addr show
```

 **Hybrid nodes are not appearing in EKS cluster** 

If you ran `nodeadm init` and it completed but your hybrid nodes do not appear in your cluster, there might be issues with the network connection between your hybrid nodes and the EKS control plane, you might not have the required security group permissions configured, or you might not have the required mapping of your Hybrid Nodes IAM role to Kubernetes Role-Based Access Control (RBAC). You can start the debugging process by checking the status of `kubelet` and the `kubelet` logs with the following commands. Run the following commands from the hybrid nodes that failed to join your cluster.

```
systemctl status kubelet
```

```
journalctl -u kubelet -f
```

 **Unable to communicate with cluster** 

If your hybrid node was unable to communicate with the cluster control plane, you might see logs similar to the following.

```
"Failed to ensure lease exists, will retry" err="Get ..."
```

```
"Unable to register node with API server" err="Post ..."
```

```
Failed to contact API server when waiting for CSINode publishing ... dial tcp <ip address>: i/o timeout
```

If you see these messages, check the following to ensure it meets the hybrid nodes requirements detailed in [Prepare networking for hybrid nodes](hybrid-nodes-networking.md).
+ Confirm the VPC passed to EKS cluster has routes to your Transit Gateway (TGW) or Virtual Private Gateway (VGW) for your on-premises node and optionally pod CIDRs.
+ Confirm you have an additional security group for your EKS cluster has inbound rules for your on-premises node CIDRs and optionally pod CIDRs.
+ Confirm your on-premises router is configured to allow traffic to and from the EKS control plane.

 **Unauthorized** 

If your hybrid node was able to communicate with the EKS control plane but was not able to register, you might see logs similar to the following. Note the key difference in the log messages below is the `Unauthorized` error. This signals that the node was not able to perform its tasks because it does not have the required permissions.

```
"Failed to ensure lease exists, will retry" err="Unauthorized"
```

```
"Unable to register node with API server" err="Unauthorized"
```

```
Failed to contact API server when waiting for CSINode publishing: Unauthorized
```

If you see these messages, check the following to ensure it meets the hybrid nodes requirements details in [Prepare credentials for hybrid nodes](hybrid-nodes-creds.md) and [Prepare cluster access for hybrid nodes](hybrid-nodes-cluster-prep.md).
+ Confirm the identity of the hybrid nodes matches your expected Hybrid Nodes IAM role. This can be done by running `sudo aws sts get-caller-identity` from your hybrid nodes.
+ Confirm your Hybrid Nodes IAM role has the required permissions.
+ Confirm that in your cluster you have an EKS access entry for your Hybrid Nodes IAM role or confirm that your `aws-auth` ConfigMap has an entry for your Hybrid Nodes IAM role. If you are using EKS access entries, confirm your access entry for your Hybrid Nodes IAM role has the `HYBRID_LINUX` access type. If you are using the `aws-auth` ConfigMap, confirm your entry for the Hybrid Nodes IAM role meets the requirements and formatting detailed in [Prepare cluster access for hybrid nodes](hybrid-nodes-cluster-prep.md).

### Hybrid nodes registered with EKS cluster but show status `Not Ready`
<a name="hybrid-nodes-troubleshooting-not-ready"></a>

If your hybrid nodes successfully registered with your EKS cluster, but the hybrid nodes show status `Not Ready`, the first thing to check is your Container Networking Interface (CNI) status. If you have not installed a CNI, then it is expected that your hybrid nodes have status `Not Ready`. Once a CNI is installed and running successfully, nodes are updated to the status `Ready`. If you attempted to install a CNI but it is not running successfully, see [Hybrid nodes CNI troubleshooting](#hybrid-nodes-troubleshooting-cni) on this page.

 **Certificate Signing Requests (CSRs) are stuck Pending** 

After connecting hybrid nodes to your EKS cluster, if you see that there are pending CSRs for your hybrid nodes, your hybrid nodes are not meeting the requirements for automatic approval. CSRs for hybrid nodes are automatically approved and issued if the CSRs for hybrid nodes were created by a node with `eks.amazonaws.com/compute-type: hybrid` label, and the CSR has the following Subject Alternative Names (SANs): at least one DNS SAN equal to the node name and the IP SANs belong to the remote node network CIDRs.

 **Hybrid profile already exists** 

If you changed your `nodeadm` configuration and attempt to reregister the node with the new configuration, you might see an error that states that the hybrid profile already exists but its contents have changed. Instead of running `nodeadm init` in between configuration changes, run `nodeadm uninstall` followed by a `nodeadm install` and `nodeadm init`. This ensures a proper clean up with the changes in configuration.

```
"msg":"Command failed","error":"hybrid profile already exists at /etc/aws/hybrid/config but its contents do not align with the expected configuration"
```

 **Hybrid node failed to resolve Private API** 

After running `nodeadm init`, if you see an error in the `kubelet` logs that shows failures to contact the EKS Kubernetes API server because there is `no such host`, you might have to change your DNS entry for the EKS Kubernetes API endpoint in your on-premises network or at the host level. See [Forwarding inbound DNS queries to your VPC](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver-forwarding-inbound-queries.html) in the * AWS Route53 documentation*.

```
Failed to contact API server when waiting for CSINode publishing: Get ... no such host
```

 **Can’t view hybrid nodes in the EKS console** 

If you have registered your hybrid nodes but are unable to view them in the EKS console, check the permissions of the IAM principal you are using to view the console. The IAM principal you’re using must have specific minimum IAM and Kubernetes permissions to view resources in the console. For more information, see [View Kubernetes resources in the AWS Management Console](view-kubernetes-resources.md).

## Running hybrid nodes troubleshooting
<a name="_running_hybrid_nodes_troubleshooting"></a>

If your hybrid nodes registered with your EKS cluster, had status `Ready`, and then transitioned to status `Not Ready`, there are a wide range of issues that might have contributed to the unhealthy status such as the node lacking sufficient resources for CPU, memory, or available disk space, or the node is disconnected from the EKS control plane. You can use the steps below to troubleshoot your nodes, and if you cannot resolve the issue, contact AWS Support.

Run `nodeadm debug` from your hybrid nodes to validate networking and credential requirements are met. For more information on the `nodeadm debug` command, see [Hybrid nodes `nodeadm` reference](hybrid-nodes-nodeadm.md).

 **Get node status** 

```
kubectl get nodes -o wide
```

 **Check node conditions and events** 

```
kubectl describe node NODE_NAME
```

 **Get pod status** 

```
kubectl get pods -A -o wide
```

 **Check pod conditions and events** 

```
kubectl describe pod POD_NAME
```

 **Check pod logs** 

```
kubectl logs POD_NAME
```

 **Check `kubectl` logs** 

```
systemctl status kubelet
```

```
journalctl -u kubelet -f
```

 **Pod liveness probes failing or webhooks are not working** 

If applications, add-ons, or webhooks running on your hybrid nodes are not starting properly, you might have networking issues that block the communication to the pods. For the EKS control plane to contact webhooks running on hybrid nodes, you must configure your EKS cluster with a remote pod network and have routes for your on-premises pod CIDR in your VPC routing table with the target as your Transit Gateway (TGW), virtual private gateway (VGW), or other gateway you are using to connect your VPC with your on-premises network. For more information on the networking requirements for hybrid nodes, see [Prepare networking for hybrid nodes](hybrid-nodes-networking.md). You additionally must allow this traffic in your on-premises firewall and ensure your router can properly route to your pods. See [Configure webhooks for hybrid nodes](hybrid-nodes-webhooks.md) for more information on the requirements for running webhooks on hybrid nodes.

A common pod log message for this scenario is shown below the following where ip-address is the Cluster IP for the Kubernetes service.

```
dial tcp <ip-address>:443: connect: no route to host
```

 ** `kubectl logs` or `kubectl exec` commands not working (`kubelet` API commands)** 

If `kubectl attach`, `kubectl cp`, `kubectl exec`, `kubectl logs`, and `kubectl port-forward` commands time out while other `kubectl` commands succeed, the issue is likely related to remote network configuration. These commands connect through the cluster to the `kubelet` endpoint on the node. For more information see [`kubelet` endpoint](hybrid-nodes-concepts-kubernetes.md#hybrid-nodes-concepts-k8s-kubelet-api).

Verify that your node IPs and pod IPs fall within the remote node network and remote pod network CIDRs configured for your cluster. Use the commands below to examine IP assignments.

```
kubectl get nodes -o wide
```

```
kubectl get pods -A -o wide
```

Compare these IPs with your configured remote network CIDRs to ensure proper routing. For network configuration requirements, see [Prepare networking for hybrid nodes](hybrid-nodes-networking.md).

## Hybrid nodes CNI troubleshooting
<a name="hybrid-nodes-troubleshooting-cni"></a>

If you run into issues with initially starting Cilium or Calico with hybrid nodes, it is most often due to networking issues between hybrid nodes or the CNI pods running on hybrid nodes, and the EKS control plane. Make sure your environment meets the requirements in Prepare networking for hybrid nodes. It’s useful to break down the problem into parts.

EKS cluster configuration  
Are the RemoteNodeNetwork and RemotePodNetwork configurations correct?

VPC configuration  
Are there routes for the RemoteNodeNetwork and RemotePodNetwork in the VPC routing table with the target of the Transit Gateway or Virtual Private Gateway?

Security group configuration  
Are there inbound and outbound rules for the RemoteNodeNetwork and RemotePodNetwork ?

On-premises network  
Are there routes and access to and from the EKS control plane and to and from the hybrid nodes and the pods running on hybrid nodes?

CNI configuration  
If using an overlay network, does the IP pool configuration for the CNI match the RemotePodNetwork configured for the EKS cluster if using webhooks?

 **Hybrid node has status `Ready` without a CNI installed** 

If your hybrid nodes are showing status `Ready`, but you have not installed a CNI on your cluster, it is possible that there are old CNI artifacts on your hybrid nodes. By default, when you uninstall Cilium and Calico with tools such as Helm, the on-disk resources are not removed from your physical or virtual machines. Additionally, the Custom Resource Definitions (CRDs) for these CNIs might still be present on your cluster from an old installation. For more information, see the Delete Cilium and Delete Calico sections of [Configure CNI for hybrid nodes](hybrid-nodes-cni.md).

 **Cilium troubleshooting** 

If you are having issues running Cilium on hybrid nodes, see [the troubleshooting steps](https://docs.cilium.io/en/stable/operations/troubleshooting/) in the Cilium documentation. The sections below cover issues that might be unique to deploying Cilium on hybrid nodes.

 **Cilium isn’t starting** 

If the Cilium agents that run on each hybrid node are not starting, check the logs of the Cilium agent pods for errors. The Cilium agent requires connectivity to the EKS Kubernetes API endpoint to start. Cilium agent startup will fail if this connectivity is not correctly configured. In this case, you will see log messages similar to the following in the Cilium agent pod logs.

```
msg="Unable to contact k8s api-server"
level=fatal msg="failed to start: Get \"https://<k8s-cluster-ip>:443/api/v1/namespaces/kube-system\": dial tcp <k8s-cluster-ip>:443: i/o timeout"
```

The Cilium agent runs on the host network. Your EKS cluster must be configured with `RemoteNodeNetwork` for the Cilium connectivity. Confirm you have an additional security group for your EKS cluster with an inbound rule for your `RemoteNodeNetwork`, that you have routes in your VPC for your `RemoteNodeNetwork`, and that your on-premises network is configured correctly to allow connectivity to the EKS control plane.

If the Cilium operator is running and some of your Cilium agents are running but not all, confirm that you have available pod IPs to allocate for all nodes in your cluster. You configure the size of your allocatable Pod CIDRs when using cluster pool IPAM with `clusterPoolIPv4PodCIDRList` in your Cilium configuration. The per-node CIDR size is configured with the `clusterPoolIPv4MaskSize` setting in your Cilium configuration. See [Expanding the cluster pool](https://docs.cilium.io/en/stable/network/concepts/ipam/cluster-pool/#expanding-the-cluster-pool) in the Cilium documentation for more information.

 **Cilium BGP is not working** 

If you are using Cilium BGP Control Plane to advertise your pod or service addresses to your on-premises network, you can use the following Cilium CLI commands to check if BGP is advertising the routes to your resources. For steps to install the Cilium CLI, see [Install the Cilium CLI](https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/#install-the-cilium-cli) in the Cilium documentation.

If BGP is working correctly, you should your hybrid nodes with Session State `established` in the output. You might need to work with your networking team to identify the correct values for your environment’s Local AS, Peer AS, and Peer Address.

```
cilium bgp peers
```

```
cilium bgp routes
```

If you are using Cilium BGP to advertise the IPs of Services with type `LoadBalancer`, you must have the same label on both your `CiliumLoadBalancerIPPool` and Service, which should be used in the selector of your `CiliumBGPAdvertisement`. An example is shown below. Note, if you are using Cilium BGP to advertise the IPs of Services with type LoadBalancer, the BGP routes might be disrupted during Cilium agent restart. For more information, see [Failure Scenarios](https://docs.cilium.io/en/latest/network/bgp-control-plane/bgp-control-plane-operation/#failure-scenarios) in the Cilium documentation.

 **Service** 

```
kind: Service
apiVersion: v1
metadata:
  name: guestbook
  labels:
    app: guestbook
spec:
  ports:
  - port: 3000
    targetPort: http-server
  selector:
    app: guestbook
  type: LoadBalancer
```

 **CiliumLoadBalancerIPPool** 

```
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
  name: guestbook-pool
  labels:
    app: guestbook
spec:
  blocks:
  - cidr: <CIDR to advertise>
  serviceSelector:
    matchExpressions:
      - { key: app, operator: In, values: [ guestbook ] }
```

 **CiliumBGPAdvertisement** 

```
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPAdvertisement
metadata:
  name: bgp-advertisements-guestbook
  labels:
    advertise: bgp
spec:
  advertisements:
    - advertisementType: "Service"
      service:
        addresses:
          - ExternalIP
          - LoadBalancerIP
      selector:
        matchExpressions:
          - { key: app, operator: In, values: [ guestbook ] }
```

 **Calico troubleshooting** 

If you are having issues running Calico on hybrid nodes, see [the troubleshooting steps](https://docs.tigera.io/calico/latest/operations/troubleshoot/) in the Calico documentation. The sections below cover issues that might be unique to deploying Calico on hybrid nodes.

The table below summarizes the Calico components and whether they run on the node or pod network by default. If you configured Calico to use NAT for outgoing pod traffic, your on-premises network must be configured to route traffic to your on-premises node CIDR and your VPC routing tables must be configured with a route for your on-premises node CIDR with your transit gateway (TGW) or virtual private gateway (VGW) as the target. If you are not configuring Calico to use NAT for outgoing pod traffic, your on-premises network must be configured to route traffic to your on-premises pod CIDR and your VPC routing tables must be configured with a route for your on-premises pod CIDR with your transit gateway (TGW) or virtual private gateway (VGW) as the target.


| Component | Network | 
| --- | --- | 
|  Calico API server  |  Node  | 
|  Calico Controllers for Kubernetes  |  Pod  | 
|  Calico node agent  |  Node  | 
|  Calico `typha`   |  Node  | 
|  Calico CSI node driver  |  Pod  | 
|  Calico operator  |  Node  | 

 **Calico resources are scheduled or running on cordoned nodes** 

The Calico resources that don’t run as a DaemonSet have flexible tolerations by default that enable them to be scheduled on cordoned nodes that are not ready for scheduling or running pods. You can tighten the tolerations for the non-DaemonSet Calico resources by changing your operator installation to include the following.

```
installation:
  ...
  controlPlaneTolerations:
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  calicoKubeControllersDeployment:
    spec:
      template:
        spec:
          tolerations:
          - effect: NoExecute
            key: node.kubernetes.io/unreachable
            operator: Exists
            tolerationSeconds: 300
          - effect: NoExecute
            key: node.kubernetes.io/not-ready
            operator: Exists
            tolerationSeconds: 300
  typhaDeployment:
    spec:
      template:
        spec:
          tolerations:
          - effect: NoExecute
            key: node.kubernetes.io/unreachable
            operator: Exists
            tolerationSeconds: 300
          - effect: NoExecute
            key: node.kubernetes.io/not-ready
            operator: Exists
            tolerationSeconds: 300
```

## Credentials troubleshooting
<a name="hybrid-nodes-troubleshooting-creds"></a>

For both AWS SSM hybrid activations and AWS IAM Roles Anywhere, you can validate that credentials for the Hybrid Nodes IAM role are correctly configured on your hybrid nodes by running the following command from your hybrid nodes. Confirm the node name and Hybrid Nodes IAM Role name are what you expect.

```
sudo aws sts get-caller-identity
```

```
{
    "UserId": "ABCDEFGHIJKLM12345678910:<node-name>",
    "Account": "<aws-account-id>",
    "Arn": "arn:aws:sts::<aws-account-id>:assumed-role/<hybrid-nodes-iam-role/<node-name>"
}
```

 ** AWS Systems Manager (SSM) troubleshooting** 

If you are using AWS SSM hybrid activations for your hybrid nodes credentials, be aware of the following SSM directories and artifacts that are installed on your hybrid nodes by `nodeadm`. For more information on the SSM agent, see [Working with the SSM agent](https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent.html) in the * AWS Systems Manager User Guide*.


| Description | Location | 
| --- | --- | 
|  SSM agent  |  Ubuntu - `/snap/amazon-ssm-agent/current/amazon-ssm-agent` RHEL & AL2023 - `/usr/bin/amazon-ssm-agent`   | 
|  SSM agent logs  |   `/var/log/amazon/ssm`   | 
|   AWS credentials  |   `/root/.aws/credentials`   | 
|  SSM Setup CLI  |   `/opt/ssm/ssm-setup-cli`   | 

 **Restarting the SSM agent** 

Some issues can be resolved by restarting the SSM agent. You can use the commands below to restart it.

 **AL2023 and other operating systems** 

```
systemctl restart amazon-ssm-agent
```

 **Ubuntu** 

```
systemctl restart snap.amazon-ssm-agent.amazon-ssm-agent
```

 **Check connectivity to SSM endpoints** 

Confirm you can connect to the SSM endpoints from your hybrid nodes. For a list of the SSM endpoints, see [AWS Systems Manager endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/ssm.html). Replace `us-west-2` in the command below with the AWS Region for your AWS SSM hybrid activation.

```
ping ssm.us-west-2.amazonaws.com
```

 **View connection status of registered SSM instances** 

You can check the connection status of the instances that are registered with SSM hybrid activations with the following AWS CLI command. Replace the machine ID with the machine ID of your instance.

```
aws ssm get-connection-status --target mi-012345678abcdefgh
```

 **SSM Setup CLI checksum mismatch** 

When running `nodeadm install` if you see an issue with the `ssm-setup-cli` checksum mismatch you should confirm there are not older existing SSM installations on your host. If there are older SSM installations on your host, remove them and re-run `nodeadm install` to resolve the issue.

```
Failed to perform agent-installation/on-prem registration: error while verifying installed ssm-setup-cli checksum: checksum mismatch with latest ssm-setup-cli.
```

 **SSM `InvalidActivation` ** 

If you see an error registering your instance with AWS SSM, confirm the `region`, `activationCode`, and `activationId` in your `nodeConfig.yaml` are correct. The AWS Region for your EKS cluster must match the region of your SSM hybrid activation. If these values are misconfigured, you might see an error similar to the following.

```
ERROR Registration failed due to error registering the instance with AWS SSM. InvalidActivation
```

 **SSM `ExpiredTokenException`: The security token included in the request is expired** 

If the SSM agent is not able to refresh credentials, you might see an `ExpiredTokenException`. In this scenario, if you are able to connect to the SSM endpoints from your hybrid nodes, you might need to restart the SSM agent to force a credential refresh.

```
"msg":"Command failed","error":"operation error SSM: DescribeInstanceInformation, https response error StatusCode: 400, RequestID: eee03a9e-f7cc-470a-9647-73d47e4cf0be, api error ExpiredTokenException: The security token included in the request is expired"
```

 **SSM error in running register machine command** 

If you see an error registering the machine with SSM, you might need to re-run `nodeadm install` to make sure all of the SSM dependencies are properly installed.

```
"error":"running register machine command: , error: fork/exec /opt/aws/ssm-setup-cli: no such file or directory"
```

 **SSM `ActivationExpired` ** 

When running `nodeadm init`, if you see an error registering the instance with SSM due to an expired activation, you need to create a new SSM hybrid activation, update your `nodeConfig.yaml` with the `activationCode` and `activationId` of your new SSM hybrid activation, and re-run `nodeadm init`.

```
"msg":"Command failed","error":"SSM activation expired. Please use a valid activation"
```

```
ERROR Registration failed due to error registering the instance with AWS SSM. ActivationExpired
```

 **SSM failed to refresh cached credentials** 

If you see a failure to refresh cached credentials, the `/root/.aws/credentials` file might have been deleted on your host. First check your SSM hybrid activation and ensure it is active and your hybrid nodes are configured correctly to use the activation. Check the SSM agent logs at `/var/log/amazon/ssm` and re-run the `nodeadm init` command once you have resolved the issue on the SSM side.

```
"Command failed","error":"operation error SSM: DescribeInstanceInformation, get identity: get credentials: failed to refresh cached credentials"
```

 **Clean up SSM** 

To remove the SSM agent from your host, you can run the following commands.

```
dnf remove -y amazon-ssm-agent
sudo apt remove --purge amazon-ssm-agent
snap remove amazon-ssm-agent
rm -rf /var/lib/amazon/ssm/Vault/Store/RegistrationKey
```

 ** AWS IAM Roles Anywhere troubleshooting** 

If you are using AWS IAM Roles Anywhere for your hybrid nodes credentials, be aware of the following directories and artifacts that are installed on your hybrid nodes by `nodeadm`. For more information on the troubleshooting IAM Roles Anywhere, see [Troubleshooting AWS IAM Roles Anywhere identity and access](https://docs.aws.amazon.com/rolesanywhere/latest/userguide/security_iam_troubleshoot.html) in the * AWS IAM Roles Anywhere User Guide*.


| Description | Location | 
| --- | --- | 
|  IAM Roles Anywhere CLI  |   `/usr/local/bin/aws_signing_helper`   | 
|  Default certificate location and name  |   `/etc/iam/pki/server.pem`   | 
|  Default key location and name  |   `/etc/iam/pki/server.key`   | 

 **IAM Roles Anywhere failed to refresh cached credentials** 

If you see a failure to refresh cached credentials, review the contents of `/etc/aws/hybrid/config` and confirm that IAM Roles Anywhere was configured correctly in your `nodeadm` configuration. Confirm that `/etc/iam/pki` exists. Each node must have a unique certificate and key. By default, when using IAM Roles Anywhere as the credential provider, `nodeadm` uses `/etc/iam/pki/server.pem` for the certificate location and name, and `/etc/iam/pki/server.key` for the private key. You might need to create the directories before placing the certificates and keys in the directories with `sudo mkdir -p /etc/iam/pki`. You can verify the content of your certificate with the command below.

```
openssl x509 -text -noout -in server.pem
```

```
open /etc/iam/pki/server.pem: no such file or directory
could not parse PEM data
Command failed {"error": "... get identity: get credentials: failed to refresh cached credentials, process provider error: error in credential_process: exit status 1"}
```

 **IAM Roles Anywhere not authorized to perform `sts:AssumeRole` ** 

In the `kubelet` logs, if you see an access denied issue for the `sts:AssumeRole` operation when using IAM Roles Anywhere, check the trust policy of your Hybrid Nodes IAM role to confirm the IAM Roles Anywhere service principal is allowed to assume the Hybrid Nodes IAM Role. Additionally confirm that the trust anchor ARN is configured properly in your Hybrid Nodes IAM role trust policy and that your Hybrid Nodes IAM role is added to your IAM Roles Anywhere profile.

```
could not get token: AccessDenied: User: ... is not authorized to perform: sts:AssumeRole on resource: ...
```

 **IAM Roles Anywhere not authorized to set `roleSessionName` ** 

In the `kubelet` logs, if you see an access denied issue for setting the `roleSessionName`, confirm you have set `acceptRoleSessionName` to true for your IAM Roles Anywhere profile.

```
AccessDeniedException: Not authorized to set roleSessionName
```

## Operating system troubleshooting
<a name="hybrid-nodes-troubleshooting-os"></a>

### RHEL
<a name="_rhel"></a>

 **Entitlement or subscription manager registration failures** 

If you are running `nodeadm install` and encounter a failure to install the hybrid nodes dependencies due to entitlement registration issues, ensure you have properly set your Red Hat username and password on your host.

```
This system is not registered with an entitlement server
```

### Ubuntu
<a name="_ubuntu"></a>

 **GLIBC not found** 

If you are using Ubuntu for your operating system and IAM Roles Anywhere for your credential provider with hybrid nodes and see an issue with GLIBC not found, you can install that dependency manually to resolve the issue.

```
GLIBC_2.32 not found (required by /usr/local/bin/aws_signing_helper)
```

Run the following commands to install the dependency:

```
ldd --version
sudo apt update && apt install libc6
sudo apt install glibc-source
```

### Bottlerocket
<a name="_bottlerocket"></a>

If you have the Bottlerocket admin container enabled, you can access it with SSH for advanced debugging and troubleshooting with elevated privileges. The following sections contain commands that need to be run on the context of the Bottlerocket host. Once you are on the admin container, you can run `sheltie` to get a full root shell in the Bottlerocket host.

```
sheltie
```

You can also run the commands in the following sections from the admin container shell by prefixing each command with `sudo chroot /.bottlerocket/rootfs`.

```
sudo chroot /.bottlerocket/rootfs <command>
```

 **Using logdog for log collection** 

Bottlerocket provides the `logdog` utility to efficiently collect logs and system information for troubleshooting purposes.

```
logdog
```

The `logdog` utility gathers logs from various locations on a Bottlerocket host and combines them into a tarball. By default, the tarball will be created at `/var/log/support/bottlerocket-logs.tar.gz`, and is accessible from host containers at `/.bottlerocket/support/bottlerocket-logs.tar.gz`.

 **Accessing system logs with journalctl** 

You can check the status of the various system services such as `kubelet`, `containerd`, etc and view their logs with the following commands. The `-f` flag will follow the logs in real time.

For checking `kubelet` service status and retrieving `kubelet` logs, you can run:

```
systemctl status kubelet
journalctl -u kubelet -f
```

For checking `containerd` service status and retrieving the logs for the orchestrated `containerd` instance, you can run:

```
systemctl status containerd
journalctl -u containerd -f
```

For checking `host-containerd` service status and retrieving the logs for the host `containerd` instance, you can run:

```
systemctl status host-containerd
journalctl -u host-containerd -f
```

For retrieving the logs for the bootstrap containers and host containers, you can run:

```
journalctl _COMM=host-ctr -f
```