Simplify node lifecycle with managed node groups
Amazon EKS managed node groups automate the provisioning and lifecycle management of nodes (Amazon EC2 instances) for Amazon EKS Kubernetes clusters.
With Amazon EKS managed node groups, you don’t need to separately provision or register the Amazon EC2 instances that provide compute capacity to run your Kubernetes applications. You can create, automatically update, or terminate nodes for your cluster with a single operation. Node updates and terminations automatically drain nodes to ensure that your applications stay available.
Every managed node is provisioned as part of an Amazon EC2 Auto Scaling group that’s managed for you by Amazon EKS. Every resource including the instances and Auto Scaling groups runs within your AWS account. Each node group runs across multiple Availability Zones that you define.
You can add a managed node group to new or existing clusters using the Amazon EKS console, eksctl
, AWS CLI; AWS API, or infrastructure as code tools including AWS CloudFormation. Nodes launched as part of a managed node group are automatically tagged for auto-discovery by the Kubernetes
Cluster Autoscaler
There are no additional costs to use Amazon EKS managed node groups, you only pay for the AWS resources you provision. These include Amazon EC2 instances, Amazon EBS volumes, Amazon EKS cluster hours, and any other AWS infrastructure. There are no minimum fees and no upfront commitments.
To get started with a new Amazon EKS cluster and managed node group, see Get started with Amazon EKS – AWS Management Console and AWS CLI.
To add a managed node group to an existing cluster, see Create a managed node group for your cluster.
Managed node groups concepts
-
Amazon EKS managed node groups create and manage Amazon EC2 instances for you.
-
Every managed node is provisioned as part of an Amazon EC2 Auto Scaling group that’s managed for you by Amazon EKS. Moreover, every resource including Amazon EC2 instances and Auto Scaling groups run within your AWS account.
-
The Auto Scaling group of a managed node group spans every subnet that you specify when you create the group.
-
Amazon EKS tags managed node group resources so that they are configured to use the Kubernetes Cluster Autoscaler
. Important
If you are running a stateful application across multiple Availability Zones that is backed by Amazon EBS volumes and using the Kubernetes Cluster Autoscaler
, you should configure multiple node groups, each scoped to a single Availability Zone. In addition, you should enable the --balance-similar-node-groups
feature. -
You can use a custom launch template for a greater level of flexibility and customization when deploying managed nodes. For example, you can specify extra
kubelet
arguments and use a custom AMI. For more information, see Customize managed nodes with launch templates. If you don’t use a custom launch template when first creating a managed node group, there is an auto-generated launch template. Don’t manually modify this auto-generated template or errors occur. -
Amazon EKS follows the shared responsibility model for CVEs and security patches on managed node groups. When managed nodes run an Amazon EKS optimized AMI, Amazon EKS is responsible for building patched versions of the AMI when bugs or issues are reported. We can publish a fix. However, you’re responsible for deploying these patched AMI versions to your managed node groups. When managed nodes run a custom AMI, you’re responsible for building patched versions of the AMI when bugs or issues are reported and then deploying the AMI. For more information, see Update a managed node group for your cluster.
-
Amazon EKS managed node groups can be launched in both public and private subnets. If you launch a managed node group in a public subnet on or after April 22, 2020, the subnet must have
MapPublicIpOnLaunch
set to true for the instances to successfully join a cluster. If the public subnet was created usingeksctl
or the Amazon EKS vended AWS CloudFormation templates on or after March 26, 2020, then this setting is already set to true. If the public subnets were created before March 26, 2020, you must change the setting manually. For more information, see Modifying the public IPv4 addressing attribute for your subnet. -
When deploying a managed node group in private subnets, you must ensure that it can access Amazon ECR for pulling container images. You can do this by connecting a NAT gateway to the route table of the subnet or by adding the following AWS PrivateLink VPC endpoints:
-
Amazon ECR API endpoint interface –
com.amazonaws.
region-code
.ecr.api -
Amazon ECR Docker registry API endpoint interface –
com.amazonaws.
region-code
.ecr.dkr -
Amazon S3 gateway endpoint –
com.amazonaws.
region-code
.s3
For other commonly-used services and endpoints, see Deploy private clusters with limited internet access.
-
-
Managed node groups can’t be deployed on AWS Outposts or in AWS Wavelength. Managed node groups can be created on AWS Local Zones
. For more information, see Launch low-latency EKS clusters with AWS Local Zones. -
You can create multiple managed node groups within a single cluster. For example, you can create one node group with the standard Amazon EKS optimized Amazon Linux AMI for some workloads and another with the GPU variant for workloads that require GPU support.
-
If your managed node group encounters an Amazon EC2 instance status check failure, Amazon EKS returns an error code to help you to diagnose the issue. For more information, see Managed node group error codes.
-
Amazon EKS adds Kubernetes labels to managed node group instances. These Amazon EKS provided labels are prefixed with
eks.amazonaws.com
. -
Amazon EKS automatically drains nodes using the Kubernetes API during terminations or updates.
-
Pod disruption budgets aren’t respected when terminating a node with
AZRebalance
or reducing the desired node count. These actions try to evict Pods on the node. But if it takes more than 15 minutes, the node is terminated regardless of whether all Pods on the node are terminated. To extend the period until the node is terminated, add a lifecycle hook to the Auto Scaling group. For more information, see Add lifecycle hooks in the Amazon EC2 Auto Scaling User Guide. -
In order to run the drain process correctly after receiving a Spot interruption notification or a capacity rebalance notification,
CapacityRebalance
must be set totrue
. -
Updating managed node groups respects the Pod disruption budgets that you set for your Pods. For more information, see Understand each phase of node updates.
-
There are no additional costs to use Amazon EKS managed node groups. You only pay for the AWS resources that you provision.
-
If you want to encrypt Amazon EBS volumes for your nodes, you can deploy the nodes using a launch template. To deploy managed nodes with encrypted Amazon EBS volumes without using a launch template, encrypt all new Amazon EBS volumes created in your account. For more information, see Encryption by default in the Amazon EC2 User Guide.
Managed node group capacity types
When creating a managed node group, you can choose either the On-Demand or Spot capacity type. Amazon EKS deploys a managed node group with an Amazon EC2 Auto Scaling group that either contains only On-Demand or only Amazon EC2 Spot Instances. You can schedule Pods for fault tolerant applications to Spot managed node groups, and fault intolerant applications to On-Demand node groups within a single Kubernetes cluster. By default, a managed node group deploys On-Demand Amazon EC2 instances.
On-Demand
With On-Demand Instances, you pay for compute capacity by the second, with no long-term commitments.
By default, if you don’t specify a Capacity Type, the managed node group is provisioned with On-Demand Instances. A managed node group configures an Amazon EC2 Auto Scaling group on your behalf with the following settings applied:
-
The allocation strategy to provision On-Demand capacity is set to
prioritized
. Managed node groups use the order of instance types passed in the API to determine which instance type to use first when fulfilling On-Demand capacity. For example, you might specify three instance types in the following order:c5.large
,c4.large
, andc3.large
. When your On-Demand Instances are launched, the managed node group fulfills On-Demand capacity by starting withc5.large
, thenc4.large
, and thenc3.large
. For more information, see Amazon EC2 Auto Scaling group in the Amazon EC2 Auto Scaling User Guide. -
Amazon EKS adds the following Kubernetes label to all nodes in your managed node group that specifies the capacity type:
eks.amazonaws.com/capacityType: ON_DEMAND
. You can use this label to schedule stateful or fault intolerant applications on On-Demand nodes.
Spot
Amazon EC2 Spot Instances are spare Amazon EC2 capacity that offers steep discounts off of On-Demand prices. Amazon EC2 Spot Instances can be interrupted with a two-minute interruption notice when EC2 needs the capacity back. For more information, see Spot Instances in the Amazon EC2 User Guide. You can configure a managed node group with Amazon EC2 Spot Instances to optimize costs for the compute nodes running in your Amazon EKS cluster.
To use Spot Instances inside a managed node group, create a managed node group by setting the capacity type as spot
. A managed node group configures an Amazon EC2 Auto Scaling group on your behalf with the following Spot best practices applied:
-
To ensure that your Spot nodes are provisioned in the optimal Spot capacity pools, the allocation strategy is set to one of the following:
-
price-capacity-optimized
(PCO) – When creating new node groups in a cluster with Kubernetes version1.28
or higher, the allocation strategy is set toprice-capacity-optimized
. However, the allocation strategy won’t be changed for node groups already created withcapacity-optimized
before Amazon EKS managed node groups started to support PCO. -
capacity-optimized
(CO) – When creating new node groups in a cluster with Kubernetes version1.27
or lower, the allocation strategy is set tocapacity-optimized
.
To increase the number of Spot capacity pools available for allocating capacity from, configure a managed node group to use multiple instance types.
-
-
Amazon EC2 Spot Capacity Rebalancing is enabled so that Amazon EKS can gracefully drain and rebalance your Spot nodes to minimize application disruption when a Spot node is at elevated risk of interruption. For more information, see Amazon EC2 Auto Scaling Capacity Rebalancing in the Amazon EC2 Auto Scaling User Guide.
-
When a Spot node receives a rebalance recommendation, Amazon EKS automatically attempts to launch a new replacement Spot node.
-
If a Spot two-minute interruption notice arrives before the replacement Spot node is in a
Ready
state, Amazon EKS starts draining the Spot node that received the rebalance recommendation. Amazon EKS drains the node on a best-effort basis. As a result, there’s no guarantee that Amazon EKS will wait for the replacement node to join the cluster before draining the existing node. -
When a replacement Spot node is bootstrapped and in the
Ready
state on Kubernetes, Amazon EKS cordons and drains the Spot node that received the rebalance recommendation. Cordoning the Spot node ensures that the service controller doesn’t send any new requests to this Spot node. It also removes it from its list of healthy, active Spot nodes. Draining the Spot node ensures that running Pods are evicted gracefully.
-
-
Amazon EKS adds the following Kubernetes label to all nodes in your managed node group that specifies the capacity type:
eks.amazonaws.com/capacityType: SPOT
. You can use this label to schedule fault tolerant applications on Spot nodes.
When deciding whether to deploy a node group with On-Demand or Spot capacity, you should consider the following conditions:
-
Spot Instances are a good fit for stateless, fault-tolerant, flexible applications. These include batch and machine learning training workloads, big data ETLs such as Apache Spark, queue processing applications, and stateless API endpoints. Because Spot is spare Amazon EC2 capacity, which can change over time, we recommend that you use Spot capacity for interruption-tolerant workloads. More specifically, Spot capacity is suitable for workloads that can tolerate periods where the required capacity isn’t available.
-
We recommend that you use On-Demand for applications that are fault intolerant. This includes cluster management tools such as monitoring and operational tools, deployments that require
StatefulSets
, and stateful applications, such as databases. -
To maximize the availability of your applications while using Spot Instances, we recommend that you configure a Spot managed node group to use multiple instance types. We recommend applying the following rules when using multiple instance types:
-
Within a managed node group, if you’re using the Cluster Autoscaler
, we recommend using a flexible set of instance types with the same amount of vCPU and memory resources. This is to ensure that the nodes in your cluster scale as expected. For example, if you need four vCPUs and eight GiB memory, use c3.xlarge
,c4.xlarge
,c5.xlarge
,c5d.xlarge
,c5a.xlarge
,c5n.xlarge
, or other similar instance types. -
To enhance application availability, we recommend deploying multiple Spot managed node groups. For this, each group should use a flexible set of instance types that have the same vCPU and memory resources. For example, if you need 4 vCPUs and 8 GiB memory, we recommend that you create one managed node group with
c3.xlarge
,c4.xlarge
,c5.xlarge
,c5d.xlarge
,c5a.xlarge
,c5n.xlarge
, or other similar instance types, and a second managed node group withm3.xlarge
,m4.xlarge
,m5.xlarge
,m5d.xlarge
,m5a.xlarge
,m5n.xlarge
or other similar instance types. -
When deploying your node group with the Spot capacity type that’s using a custom launch template, use the API to pass multiple instance types. Don’t pass a single instance type through the launch template. For more information about deploying a node group using a launch template, see Customize managed nodes with launch templates.
-