Amazon Redshift provisioned clusters - Amazon Redshift

Amazon Redshift provisioned clusters

An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases.

Note

At this time, Amazon Redshift version 1.0 engine is available. However, as the engine is updated, multiple Amazon Redshift engine versions might be available for selection.

Clusters and nodes in Amazon Redshift

An Amazon Redshift cluster consists of nodes. Each cluster has a leader node and one or more compute nodes. The leader node receives queries from client applications, parses the queries, and develops query execution plans. The leader node then coordinates the parallel execution of these plans with the compute nodes and aggregates the intermediate results from these nodes. It then finally returns the results back to the client applications.

Compute nodes run the query execution plans and transmit data among themselves to serve these queries. The intermediate results are sent to the leader node for aggregation before being sent back to the client applications. For more information about leader nodes and compute nodes, see Data warehouse system architecture in the Amazon Redshift Database Developer Guide.

Note

When you create a cluster on the Amazon Redshift console (https://console.aws.amazon.com/redshiftv2/), you can get a recommendation of your cluster configuration based on the size of your data and query characteristics. To use this sizing calculator, look for Help me choose on the console in AWS Regions that support RA3 node types. For more information, see Creating a cluster.

When you launch a cluster, one option that you specify is the node type. The node type determines the CPU, RAM, storage capacity, and storage drive type for each node.

Amazon Redshift offers different node types to accommodate your workloads, and we recommend choosing RA3 or DC2 depending on the required performance, data size, and expected data growth.

RA3 nodes with managed storage enable you to optimize your data warehouse by scaling and paying for compute and managed storage independently. With RA3, you choose the number of nodes based on your performance requirements and only pay for the managed storage that you use. Size your RA3 cluster based on the amount of data you process daily. You launch clusters that use the RA3 node types in a virtual private cloud (VPC). You can't launch RA3 clusters in EC2-Classic. For more information, see Creating a Redshift provisioned cluster or Amazon Redshift Serverless workgroup in a VPC.

Amazon Redshift managed storage uses large, high-performance SSDs in each RA3 node for fast local storage and Amazon S3 for longer-term durable storage. If the data in a node grows beyond the size of the large local SSDs, Amazon Redshift managed storage automatically offloads that data to Amazon S3. You pay the same low rate for Amazon Redshift managed storage regardless of whether the data sits in high-performance SSDs or Amazon S3. For workloads that require ever-growing storage, managed storage lets you automatically scale your data warehouse storage capacity separate from compute nodes.

DC2 nodes enable you to have compute-intensive data warehouses with local SSD storage included. You choose the number of nodes you need based on data size and performance requirements. DC2 nodes store your data locally for high performance, and as the data size grows, you can add more compute nodes to increase the storage capacity of the cluster. For datasets under 1 TB (compressed), we recommend DC2 node types for the best performance at the lowest price. If you expect your data to grow, we recommend using RA3 nodes so you can size compute and storage independently to achieve improved price and performance. You launch clusters that use the DC2 node types in a virtual private cloud (VPC). You can't launch DC2 clusters in EC2-Classic. For more information, see Creating a Redshift provisioned cluster or Amazon Redshift Serverless workgroup in a VPC.

Node types are available in different sizes. Node size and the number of nodes determine the total storage for a cluster. For more information, see Node type details.

Some node types allow one node (single-node) or two or more nodes (multi-node). The minimum number of nodes for clusters of some node types is two nodes. On a single-node cluster, the node is shared for leader and compute functionality. Single-node clusters are not recommended for running production workloads. On a multi-node cluster, the leader node is separate from the compute nodes. The leader node is the same node type as the compute nodes. You only pay for compute nodes.

Amazon Redshift applies quotas to resources for each AWS account in each AWS Region. A quota restricts the number of resources that your account can create for a given resource type, such as nodes or snapshots, within an AWS Region. For more information about the default quotas that apply to Amazon Redshift resources, see Quotas and limits in Amazon Redshift.

The cost of your cluster depends on the AWS Region, node type, number of nodes, and whether the nodes are reserved in advance. For more information about the cost of nodes, see the Amazon Redshift pricing page.

Node type details

The following tables summarize the node specifications for each node type and size. The headings in the tables have these meanings:

  • vCPU is the number of virtual CPUs for each node.

  • RAM is the amount of memory in gibibytes (GiB) for each node.

  • Default slices per node is the number of slices into which a compute node is partitioned when a cluster is created or resized with classic resize.

    The number of slices per node might change if the cluster is resized using elastic resize. However the total number of slices on all the compute nodes in the cluster remains the same after elastic resize.

    When you create a cluster with the restore from snapshot operation, the number of slices of the resulting cluster might change from the original cluster if you change the node type.

  • Storage is the capacity and type of storage for each node.

  • Node range is the minimum and maximum number of nodes that Amazon Redshift supports for the node type and size.

    Note

    You might be restricted to fewer nodes depending on the quota that is applied to your AWS account in the selected AWS Region. For more information about the default quotas that apply to Amazon Redshift resources, see Quotas and limits in Amazon Redshift.

  • Total capacity is the total storage capacity for the cluster if you deploy the maximum number of nodes that is specified in the node range.

The following table describes specifications for RA3 nodes.

Node type vCPU RAM (GiB) Default slices per node Managed storage limit per node 1 Node range with create cluster Total managed storage capacity 2
ra3.large (single-node) 2 16 2 1 TB 1 1 TB3
ra3.large (multi-node) 2 16 2 8 TB 2-16 128 TB
ra3.xlplus (single-node) 4 32 2 4 TB 1 4 TB3
ra3.xlplus (multi-node) 4 32 2 32 TB 2–164 1024 TB4
ra3.4xlarge 12 96 4 128 TB 2–325 8192 TB5
ra3.16xlarge 48 384 16 128 TB 2–128 16,384 TB

1 The storage limit for Amazon Redshift managed storage. This is a hard limit.

2 Total managed storage limit is the maximum number of nodes times the managed storage limit per node.

3 To resize a single-node cluster to multi-node, only classic resize is supported.

4 You can create a cluster with the ra3.xlplus (multi-node) node type that has up to 16 nodes. For multiple-node clusters, you can resize with elastic resize to a maximum of 32 nodes.

5 You can create a cluster with the ra3.4xlarge node type with up to 32 nodes. You can resize it with elastic resize to a maximum of 64 nodes.

The following table describes specifications for dense compute nodes.

Node type vCPU RAM (GiB) Default slices per node Storage per node Node range Total capacity
dc2.large 2 15 2 160 GB NVMe-SSD 1–32 5.12 TB
dc2.8xlarge 32 244 16 2.56 TB NVMe-SSD 2–128 326 TB
Note

Dense storage (DS2) node types are no longer available.

Previous node type names

In previous releases of Amazon Redshift, certain node types had different names. You can use the previous names in the Amazon Redshift API and AWS CLI. However, we recommend that you update any scripts that reference those names to use the current names instead. The current and previous names are as follows.

Current name Previous names
ds2.xlarge ds1.xlarge, dw.hs1.xlarge, dw1.xlarge
ds2.8xlarge ds1.8xlarge, dw.hs1.8xlarge, dw1.8xlarge
dc1.large dw2.large
dc1.8xlarge dw2.8xlarge

Determining the number of nodes

Because Amazon Redshift distributes and runs queries in parallel across all of a cluster’s compute nodes, you can increase query performance by adding nodes to your cluster. When you run a cluster with at least two compute nodes, data on each node is mirrored on disks of another node to reduce the risk of incurring data loss.

You can monitor query performance in the Amazon Redshift console and with Amazon CloudWatch metrics. You can also add or remove nodes as needed to achieve the balance between price and performance for your cluster. When you request an additional node, Amazon Redshift takes care of all the details of deployment, load balancing, and data maintenance. For more information about cluster performance, see Monitoring Amazon Redshift cluster performance.

Reserved nodes are appropriate for steady-state production workloads, and offer significant discounts over on-demand nodes. You can purchase reserved nodes after running experiments and proof-of-concepts to validate your production configuration. For more information, see Reserved nodes.

When you pause a cluster, you suspend on-demand billing during the time the cluster is paused. During this paused time, you only pay for backup storage. This frees you from planning and purchasing data warehouse capacity ahead of your needs, and enables you to cost-effectively manage environments for development or test purposes.

For information about pricing of on-demand and reserved nodes, see Amazon Redshift pricing.

Use EC2-VPC when you create your cluster

Amazon Redshift clusters run in Amazon EC2 instances that are configured for the Amazon Redshift node type and size that you select. Create your cluster using EC2-VPC. If you are still using EC2-Classic, we recommend you use EC2-VPC to get improved performance and security. For more information about these networking platforms, see Supported Platforms in the Amazon EC2 User Guide. Your AWS account settings determine whether EC2-VPC or EC2-Classic are available to you.

Note

To prevent connection issues between SQL client tools and the Amazon Redshift database, we recommend doing one of two things. You can configure an inbound rule that enables the hosts to negotiate packet size. Alternatively, you can disable TCP/IP jumbo frames by setting the maximum transmission unit (MTU) to 1500 on the network interface (NIC) of your Amazon EC2 instances. For more information about these approaches, see Queries appear to hang and sometimes fail to reach the cluster.

EC2-VPC

When using EC2-VPC, your cluster runs in a virtual private cloud (VPC) that is logically isolated to your AWS account. If you provision your cluster in the EC2-VPC, you control access to your cluster by associating one or more VPC security groups with the cluster. For more information, see Security Groups for Your VPC in the Amazon VPC User Guide.

To create a cluster in a VPC, you must first create an Amazon Redshift cluster subnet group by providing subnet information of your VPC, and then provide the subnet group when launching the cluster. For more information, see Subnets for Redshift resources.

For more information about Amazon Virtual Private Cloud (Amazon VPC), see the Amazon VPC product detail page.

Default disk space alarm

When you create an Amazon Redshift cluster, you can optionally configure an Amazon CloudWatch alarm to monitor the average percentage of disk space that is used across all of the nodes in your cluster. We’ll refer to this alarm as the default disk space alarm.

The purpose of default disk space alarm is to help you monitor the storage capacity of your cluster. You can configure this alarm based on the needs of your data warehouse. For example, you can use the warning as an indicator that you might need to resize your cluster. You might resize either to a different node type or to add nodes, or perhaps to purchase reserved nodes for future expansion.

The default disk space alarm triggers when disk usage reaches or exceeds a specified percentage for a certain number of times and at a specified duration. By default, this alarm triggers when the percentage that you specify is reached, and stays at or above that percentage for five minutes or longer. You can edit the default values after you launch the cluster.

When the CloudWatch alarm triggers, Amazon Simple Notification Service (Amazon SNS) sends a notification to specified recipients to warn them that the percentage threshold is reached. Amazon SNS uses a topic to specify the recipients and message that are sent in a notification. You can use an existing Amazon SNS topic; otherwise, a topic is created based on the settings that you specify when you launch the cluster. You can edit the topic for this alarm after you launch the cluster. For more information about creating Amazon SNS topics, see Getting Started with Amazon Simple Notification Service.

After you launch the cluster, you can view and edit the alarm from the cluster’s Status window under CloudWatch Alarms. The name is percentage-disk-space-used-default-<string>. You can open the alarm to view the Amazon SNS topic that it is associated with and edit alarm settings. If you did not select an existing Amazon SNS topic to use, the one created for you is named <clustername>-default-alarms (<recipient>); for example, examplecluster-default-alarms (notify@example.com).

For more information about configuring and editing the default disk space alarm, see Creating a cluster and Creating a disk space alarm.

Note

If you delete your cluster, the alarm associated with the cluster will not be deleted but it will not trigger. You can delete the alarm from the CloudWatch console if you no longer need it.

Cluster status

The cluster status displays the current state of the cluster. The following table provides a description for each cluster status.

Status Description
available The cluster is running and available.
available, prep-for-resize The cluster is being prepared for elastic resize. The cluster is running and available for read and write queries, but cluster operations, such as creating a snapshot, are not available.
available, resize-cleanup An elastic resize operation is completing data transfer to the new cluster nodes. The cluster is running and available for read and write queries, but cluster operations, such as creating a snapshot, are not available.
cancelling-resize The resize operation is being cancelled.
creating Amazon Redshift is creating the cluster. For more information, see Creating a cluster.
deleting Amazon Redshift is deleting the cluster. For more information, see Shutting down and deleting a cluster.
final-snapshot Amazon Redshift is taking a final snapshot of the cluster before deleting it. For more information, see Shutting down and deleting a cluster.
hardware-failure

The cluster suffered a hardware failure.

If you have a single-node cluster, the node cannot be replaced. To recover your cluster, restore a snapshot. For more information, see Amazon Redshift snapshots and backups.

incompatible-hsm Amazon Redshift cannot connect to the hardware security module (HSM). Check the HSM configuration between the cluster and HSM. For more information, see Encryption using hardware security modules.
incompatible-network There is an issue with the underlying network configuration. Make sure that the VPC in which you launched the cluster exists and its settings are correct. For more information, see Redshift resources in a VPC.
incompatible-parameters There is an issue with one or more parameter values in the associated parameter group, and the parameter value or values cannot be applied. Modify the parameter group and update any invalid values. For more information, see Amazon Redshift parameter groups.
incompatible-restore There was an issue restoring the cluster from the snapshot. Try restoring the cluster again with a different snapshot. For more information, see Amazon Redshift snapshots and backups.
modifying Amazon Redshift is applying changes to the cluster. For more information, see Modifying a cluster.
paused The cluster is paused. For more information, see Pausing and resuming a cluster.
rebooting Amazon Redshift is rebooting the cluster. For more information, see Rebooting a cluster.
renaming Amazon Redshift is applying a new name to the cluster. For more information, see Renaming a cluster.
resizing Amazon Redshift is resizing the cluster. For more information, see Resizing a cluster.
rotating-keys Amazon Redshift is rotating encryption keys for the cluster. For more information, see Encryption key rotation.
storage-full The cluster has reached its storage capacity. Resize the cluster to add nodes or to choose a different node size. For more information, see Resizing a cluster.
updating-hsm Amazon Redshift is updating the HSM configuration.