

# What is Amazon Keyspaces (for Apache Cassandra)?
<a name="what-is-keyspaces"></a>

 Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and managed Apache Cassandra–compatible database service. With Amazon Keyspaces, you don’t have to provision, patch, or manage servers, and you don’t have to install, maintain, or operate software. 

Amazon Keyspaces is serverless, so you pay for only the resources that you use, and the service automatically scales tables up and down in response to application traffic. You can build applications that serve thousands of requests per second with virtually unlimited throughput and storage. 

**Note**  
 Apache Cassandra is an open-source, wide-column datastore that is designed to handle large amounts of data. For more information, see [Apache Cassandra](http://cassandra.apache.org/).

Amazon Keyspaces makes it easy to migrate, run, and scale Cassandra workloads in the AWS Cloud. With just a few clicks on the AWS Management Console or a few lines of code, you can create keyspaces and tables in Amazon Keyspaces, without deploying any infrastructure or installing software.

With Amazon Keyspaces, you can run your existing Cassandra workloads on AWS using the same Cassandra application code and developer tools that you use today. 

With the [pricing calculator for Amazon Keyspaces (for Apache Cassandra)](https://aws-samples.github.io/sample-pricing-calculator-for-keyspaces/#cassandra) available on Github, you can estimate your monthly costs for Amazon Keyspaces based on your existing Apache Cassandra workload. Enter metrics from your Cassandra nodetool status output and intended serverless configuration for Amazon Keyspaces to compare direct costs between the two solutions. Note that this calculator focuses only on the operational costs of Amazon Keyspaces compared to your existing Cassandra deployment. It doesn't include total cost of ownership (TCO) factors like infrastructure maintenance, operational overhead, or support costs for Cassandra.

For a list of available AWS Regions and endpoints, see [Service endpoints for Amazon Keyspaces](https://docs.aws.amazon.com/keyspaces/latest/devguide/programmatic.endpoints.html).

We recommend that you start by reading the following sections:

**Topics**
+ [

# Amazon Keyspaces: How it works
](how-it-works.md)
+ [

# Amazon Keyspaces use cases
](use-cases.md)
+ [

# What is Cassandra Query Language (CQL)?
](what-is-cql.md)

# Amazon Keyspaces: How it works
<a name="how-it-works"></a>

Amazon Keyspaces removes the administrative overhead of managing Cassandra. To understand why, it's helpful to begin with Cassandra architecture and then compare it to Amazon Keyspaces.

**Topics**
+ [

## High-level architecture: Apache Cassandra vs. Amazon Keyspaces
](#how-it-works.cassandra-arch)
+ [

## Cassandra data model
](#how-it-works.data-model)
+ [

## Accessing Amazon Keyspaces from an application
](#how-it-works.keyspaces-arch.accessing)

## High-level architecture: Apache Cassandra vs. Amazon Keyspaces
<a name="how-it-works.cassandra-arch"></a>

 Traditional Apache Cassandra is deployed in a cluster made up of one or more nodes. You are responsible for managing each node and adding and removing nodes as your cluster scales. 

A client program accesses Cassandra by connecting to one of the nodes and issuing Cassandra Query Language (CQL) statements. *CQL* is similar to SQL, the popular language used in relational databases. Even though Cassandra is not a relational database, CQL provides a familiar interface for querying and manipulating data in Cassandra.

The following diagram shows a simple Apache Cassandra cluster, consisting of four nodes.

![\[Diagram of an Apache Cassandra cluster containing 4 nodes and interacting with client application.\]](http://docs.aws.amazon.com/keyspaces/latest/devguide/images/keyspaces_cassandra-hi-level.png)


A production Cassandra deployment might consist of hundreds of nodes, running on hundreds of physical computers across one or more physical data centers. This can cause an operational burden for application developers who need to provision, patch, and manage servers in addition to installing, maintaining, and operating software. 

With Amazon Keyspaces (for Apache Cassandra), you don’t need to provision, patch, or manage servers, so you can focus on building better applications. Amazon Keyspaces offers two throughput capacity modes for reads and writes: on-demand and provisioned. You can choose your table’s throughput capacity mode to optimize the price of reads and writes based on the predictability and variability of your workload. 

With on-demand mode, you pay for only the reads and writes that your application actually performs. You do not need to specify your table’s throughput capacity in advance. Amazon Keyspaces accommodates your application traffic almost instantly as it ramps up or down, making it a good option for applications with unpredictable traffic.

Provisioned capacity mode helps you optimize the price of throughput if you have predictable application traffic and can forecast your table’s capacity requirements in advance. With provisioned capacity mode, you specify the number of reads and writes per second that you expect your application to perform. You can increase and decrease the provisioned capacity for your table automatically by enabling [automatic scaling](https://docs.aws.amazon.com/keyspaces/latest/devguide/autoscaling.html).

You can change the capacity mode of your table once per day as you learn more about your workload’s traffic patterns, or if you expect to have a large burst in traffic, such as from a major event that you anticipate will drive a lot of table traffic. For more information about read and write capacity provisioning, see [Configure read/write capacity modes in Amazon Keyspaces](ReadWriteCapacityMode.md). 

Amazon Keyspaces (for Apache Cassandra) stores three copies of your data in multiple [ Availability Zones](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/) for durability and high availability. In addition, you benefit from a data center and network architecture that is built to meet the requirements of the most security-sensitive organizations. Encryption at rest is automatically enabled when you create a new Amazon Keyspaces table and all client connections require Transport Layer Security (TLS). Additional AWS security features include [monitoring](https://docs.aws.amazon.com/keyspaces/latest/devguide/monitoring.html), [AWS Identity and Access Management](https://docs.aws.amazon.com/keyspaces/latest/devguide/security_iam_service-with-iam.html), and [virtual private cloud (VPC) endpoints](https://docs.aws.amazon.com/keyspaces/latest/devguide/vpc-endpoints.html). For an overview of all available security features, see [Security in Amazon Keyspaces (for Apache Cassandra)](security.md). 

The following diagram shows the architecture of Amazon Keyspaces.

![\[Diagram of Amazon Keyspaces interacting with client application.\]](http://docs.aws.amazon.com/keyspaces/latest/devguide/images/keyspaces-hi-level.png)


A client program accesses Amazon Keyspaces by connecting to a predetermined endpoint (hostname and port number) and issuing CQL statements. For a list of available endpoints, see [Service endpoints for Amazon Keyspaces](programmatic.endpoints.md).

## Cassandra data model
<a name="how-it-works.data-model"></a>

How you model your data for your business case is critical to achieving optimal performance from Amazon Keyspaces. A poor data model can significantly degrade performance.

Even though CQL looks similar to SQL, the backends of Cassandra and relational databases are very different and must be approached differently. The following are some of the more significant issues to consider:

**Storage**  
You can visualize your Cassandra data in tables, with each row representing a record and each column a field within that record. 

**Table design: Query first**  
There are no `JOIN`s in CQL. Therefore, you should design your tables with the shape of your data and how you need to access it for your business use cases. This might result in de-normalization with duplicated data. You should design each of your tables specifically for a particular access pattern.

**Partitions**  
 Your data is stored in partitions on disk. The number of partitions your data is stored in and how it is distributed across the partitions is determined by your *partition key*. How you define your partition key can have a significant impact upon the performance of your queries. For best practices, see [How to use partition keys effectively in Amazon Keyspaces](bp-partition-key-design.md).

**Primary key**  
In Cassandra, data is stored as a key-value pair. Every Cassandra table must have a primary key, which is the unique key to each row in the table. The primary key is the composite of a required partition key and optional clustering columns. The data that comprises the primary key must be unique across all records in a table.  
+ **Partition key** – The partition key portion of the primary key is required and determines which partition of your cluster the data is stored in. The partition key can be a single column, or it can be a compound value composed of two or more columns. You would use a compound partition key if a single column partition key would result in a single partition or a very few partitions having most of the data and thus bearing the majority of the disk I/O operations. 
+ **Clustering column** – The optional clustering column portion of your primary key determines how the data is clustered and sorted within each partition. If you include a clustering column in your primary key, the clustering column can have one or more columns. If there are multiple columns in the clustering column, the sorting order is determined by the order that the columns are listed in the clustering column, from left to right.

For more information about NoSQL design and Amazon Keyspaces, see [Key differences and design principles of NoSQL design](bp-general-nosql-design.md). For more information about Amazon Keyspaces and data modeling, see [Data modeling best practices: recommendations for designing data models](data-modeling.md).

## Accessing Amazon Keyspaces from an application
<a name="how-it-works.keyspaces-arch.accessing"></a>

Amazon Keyspaces (for Apache Cassandra) implements the Apache Cassandra Query Language (CQL) API, so you can use CQL and Cassandra drivers that you already use. Updating your application is as easy as updating your Cassandra driver or `cqlsh` configuration to point to the Amazon Keyspaces service endpoint. For more information about the required credentials, see [Create and configure AWS credentials for Amazon Keyspaces](access.credentials.md). 

**Note**  
To help you get started, you can find end-to-end code samples of connecting to Amazon Keyspaces by using various Cassandra client drivers in the Amazon Keyspaces code example repository on [GitHub](https://github.com/aws-samples/amazon-keyspaces-examples).

 Consider the following Python program, which connects to a Cassandra cluster and queries a table.

```
from cassandra.cluster import Cluster
#TLS/SSL configuration goes here

ksp = 'MyKeyspace'
tbl = 'WeatherData'

cluster = Cluster(['NNN.NNN.NNN.NNN'], port=NNNN)
session = cluster.connect(ksp)

session.execute('USE ' + ksp)

rows = session.execute('SELECT * FROM ' +  tbl)
for row in rows:
    print(row)
```

To run the same program against Amazon Keyspaces, you need to: 
+  **Add the cluster endpoint and port**: For example, the host can be replaced with a service endpoint, such as `cassandra.us-east-1.amazonaws.com` and the port number with: `9142`. 
+  **Add the TLS/SSL configuration**: For more information on adding the TLS/SSL configuration to connect to Amazon Keyspaces by using a Cassandra client Python driver, see [Using a Cassandra Python client driver to access Amazon Keyspaces programmatically](using_python_driver.md). 

# Amazon Keyspaces use cases
<a name="use-cases"></a>

The following are just some of the ways in which you can use Amazon Keyspaces:
+  **Build applications that require low latency** – Process data at high speeds for applications that require single-digit-millisecond latency, such as industrial equipment maintenance, trade monitoring, fleet management, and route optimization. 
+  **Build applications using open-source technologies** – Build applications on AWS using open-source Cassandra APIs and drivers that are available for a wide range of programming languages, such as Java, Python, Ruby, Microsoft .NET, Node.js, PHP, C\$1\$1, Perl, and Go. For code examples, see [Amazon Keyspaces (for Apache Cassandra) libraries and tools](examples-tools.md).
+  **Move your Cassandra workloads to the cloud** – Managing Cassandra tables yourself is time-consuming and expensive. With Amazon Keyspaces, you can set up, secure, and scale Cassandra tables in the AWS Cloud without managing infrastructure. For more information, see [Managing serverless resources in Amazon Keyspaces (for Apache Cassandra)](serverless_resource_management.md).

# What is Cassandra Query Language (CQL)?
<a name="what-is-cql"></a>

*Cassandra Query Language* (CQL) is the primary language for communicating with Apache Cassandra. Amazon Keyspaces (for Apache Cassandra) is compatible with the CQL 3.x API (backward-compatible with version 2.x). 

In CQL, data is stored in tables, columns, and rows. In this sense CQL is similar to Structured Query Language (SQL). These are the key concepts in CQL.
+ **CQL elements** – The fundamental elements of CQL are identifiers, constants, terms, and data types. 
+ **Data Definition Language (DDL)** – DDL statements are used to manage data structures like keyspaces and tables, which are AWS resources in Amazon Keyspaces. DDL statements are control plane operations in AWS.
+ **Data Manipulation Language (DML) ** – DML statements are used to manage data within tables. DML statements are used for selecting, inserting, updating, and deleting data. These are data plane operations in AWS.
+ **Built-in functions** – Amazon Keyspaces supports a variety of built-in scalar functions that you can use in CQL statements. 

For more information about CQL, see [CQL language reference for Amazon Keyspaces (for Apache Cassandra)](cql.md). For functional differences with Apache Cassandra, see [Functional differences: Amazon Keyspaces vs. Apache Cassandra](functional-differences.md).

To run CQL queries, you can do one of the following:
+ Use the CQL editor in the AWS Management Console.
+ Use AWS CloudShell and the [cqlsh-expansion](programmatic.cqlsh.md#using_cqlsh).
+ Use a `cqlsh` client.
+ Use an Apache 2.0 licensed Cassandra client driver.

In addition to CQL, you can perform Data Definition Language (DDL) operations in Amazon Keyspaces using the AWS SDKs and the AWS Command Line Interface.

For more information about using these methods to access Amazon Keyspaces, see [Accessing Amazon Keyspaces (for Apache Cassandra)](accessing.md). 