

# What is Amazon DocumentDB (with MongoDB compatibility)
<a name="what-is"></a>

Amazon DocumentDB (with MongoDB compatibility) is a fast, reliable, and fully managed database service. Amazon DocumentDB makes it easy to set up, operate, and scale MongoDB-compatible databases in the cloud. With Amazon DocumentDB, you can run the same application code and use the same drivers and tools that you use with MongoDB.

Before using Amazon DocumentDB, you should review the concepts and features described in [How it works](how-it-works.md). After that, complete the steps in [Get started guide](get-started-guide.md).

**Topics**
+ [Overview](#overview)
+ [

## Clusters
](#what-is-db-clusters)
+ [

## Instances
](#what-is-db-instances)
+ [Regions and AZs](#what-is-regions-and-azs)
+ [Pricing](#docdb-pricing)
+ [

## Monitoring
](#what-is-monitoring)
+ [

## Interfaces
](#what-is-interfaces)
+ [

## What's next?
](#what-is-next)
+ [How it works](how-it-works.md)
+ [

# What is a document database?
](what-is-document-db.md)

## Overview of Amazon DocumentDB
<a name="overview"></a>

The following are some high-level features of Amazon DocumentDB:
+ Amazon DocumentDB supports two types of clusters: instance-based clusters and elastic clusters. Elastic clusters support workloads with millions of reads/writes per second and petabytes of storage capacity. For more information about elastic clusters, see [Using Amazon DocumentDB elastic clusters](docdb-using-elastic-clusters.md). The content below refers to Amazon DocumentDB instance-based clusters .
+ Amazon DocumentDB automatically grows the size of your storage volume as your database storage needs grow. Your storage volume grows in increments of 10 GB, up to a maximum of 128 TiB. You don't need to provision any excess storage for your cluster to handle future growth.
+ With Amazon DocumentDB, you can increase read throughput to support high-volume application requests by creating up to 15 replica instances. Amazon DocumentDB replicas share the same underlying storage, lowering costs and avoiding the need to perform writes at the replica nodes. This capability frees up more processing power to serve read requests and reduces the replica lag time—often down to single digit milliseconds. You can add replicas in minutes regardless of the storage volume size. Amazon DocumentDB also provides a reader endpoint, so the application can connect without having to track replicas as they are added and removed.
+ Amazon DocumentDB lets you scale the compute and memory resources for each of your instances up or down. Compute scaling operations typically complete in a few minutes.
+ Amazon DocumentDB runs in Amazon Virtual Private Cloud (Amazon VPC), so you can isolate your database in your own virtual network. You can also configure firewall settings to control network access to your cluster.
+ Amazon DocumentDB continuously monitors the health of your cluster. On an instance failure, Amazon DocumentDB automatically restarts the instance and associated processes. Amazon DocumentDB doesn't require a crash recovery replay of database redo logs, which greatly reduces restart times. Amazon DocumentDB also isolates the database cache from the database process, enabling the cache to survive an instance restart.
+ On instance failure, Amazon DocumentDB automates failover to one of up to 15 Amazon DocumentDB replicas that you create in other Availability Zones. If no replicas have been provisioned and a failure occurs, Amazon DocumentDB tries to create a new Amazon DocumentDB instance automatically.
+ The backup capability in Amazon DocumentDB enables point-in-time recovery for your cluster. This feature allows you to restore your cluster to any second during your retention period, up to the last 5 minutes. You can configure your automatic backup retention period up to 35 days. Automated backups are stored in Amazon Simple Storage Service (Amazon S3), which is designed for 99.999999999% durability. Amazon DocumentDB backups are automatic, incremental, and continuous, and they have no impact on your cluster performance.
+ With Amazon DocumentDB, you can encrypt your databases using keys that you create and control through AWS Key Management Service (AWS KMS). On a database cluster running with Amazon DocumentDB encryption, data stored at rest in the underlying storage is encrypted. The automated backups, snapshots, and replicas in the same cluster are also encrypted.
+ Amazon DocumentDB is authorized under Federal Risk and Authorization Management Program (FedRAMP). It has FedRAMP High authorization for AWS GovCloud (US) regions and FedRAMP Moderate authorization for AWS US East/West Regions. For details about AWS and compliance efforts, see [AWS Services in Scope by Compliance Program](https://aws.amazon.com/compliance/services-in-scope/FedRAMP/).

If you are new to AWS services, use the following resources to learn more:
+ AWS offers services for computing, databases, storage, analytics, and other functionality. For an overview of all AWS services, see [Cloud Computing with Amazon Web Services](https://aws.amazon.com/what-is-aws/).
+ AWS provides a number of database services. For guidance on which service is best for your environment, see [Databases on AWS](https://aws.amazon.com/products/databases/).

## Clusters
<a name="what-is-db-clusters"></a>

A *cluster* consists of 0 to 16 instances and a cluster storage volume that manages the data for those instances. All writes are done through the primary instance. All instances (primary and replicas) support reads. The cluster's data is stored in the cluster volume with copies in three different Availability Zones.

![\[Amazon DocumentDB cluster containing primary instance in Availability Zone 1, writing to cluster volume for replicas in zones 2 and 3.\]](http://docs.aws.amazon.com/documentdb/latest/developerguide/images/how-it-works-01c.png)


Amazon DocumentDB 5.0 instance-based clusters support two storage configurations for a database cluster: Amazon DocumentDB standard and Amazon DocumentDB I/O-optimized. For more information see [Amazon DocumentDB cluster storage configurations](db-cluster-storage-configs.md).

## Instances
<a name="what-is-db-instances"></a>

An Amazon DocumentDB instance is an isolated database environment in the cloud. An instance can contain multiple user-created databases. You can create and modify an instance using the AWS Management Console or the AWS CLI.

The computation and memory capacity of an instance are determined by its *instance class*. You can select the instance that best meets your needs. If your needs change over time, you can choose a different instance class. For instance class specifications, see [Instance class specifications](db-instance-classes.md#db-instance-class-specs).

Amazon DocumentDB instances run only in the Amazon VPC environment. Amazon VPC gives you control of your virtual networking environment: You can choose your own IP address range, create subnets, and configure routing and access control lists (ACLs).

Before you can create Amazon DocumentDB instances, you must create a cluster to contain the instances.

Not all instance classes are supported in every region. The following table shows which instance classes are supported in each region.

**Note**  
For a complete list of instance types supported by Amazon DocumentDB in each instance class, see [Instance class specifications](db-instance-classes.md#db-instance-class-specs).


**Supported instance classes by Region**  

|  | Instance Classes | Region | R8G | R6GD | R6G | R5 | R4 | T4G | T3 | Serverless | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| US East (Ohio) | Supported | Supported | Supported | Supported | Supported | Supported | Supported | Supported | 
| US East (N. Virginia) | Supported | Supported | Supported | Supported | Supported | Supported | Supported | Supported | 
| US West (Oregon) | Supported | Supported | Supported | Supported | Supported | Supported | Supported | Supported | 
| Africa (Cape Town) |  |  | Supported | Supported |  | Supported | Supported | Supported | 
| South America (São Paulo) |  | Supported | Supported | Supported |  | Supported | Supported | Supported | 
| Asia Pacific (Hong Kong) |  |  | Supported | Supported |  | Supported | Supported | Supported | 
| Asia Pacific (Hyderabad) |  |  | Supported | Supported |  | Supported | Supported | Supported | 
| Asia Pacific (Malaysia) |  |  | Supported |  |  | Supported | Supported |  | 
| Asia Pacific (Mumbai) | Supported | Supported | Supported | Supported |  | Supported | Supported | Supported | 
| Asia Pacific (Osaka) |  | Supported | Supported | Supported |  | Supported | Supported |  | 
| Asia Pacific (Seoul) | Supported | Supported | Supported | Supported |  | Supported | Supported | Supported | 
| Asia Pacific (Sydney) | Supported | Supported | Supported | Supported |  | Supported | Supported | Supported | 
| Asia Pacific (Jakarta) | Supported | Supported | Supported | Supported |  | Supported | Supported |  | 
| Asia Pacific (Melbourne) |  |  | Supported | Supported |  | Supported | Supported |  | 
| Asia Pacific (Singapore) | Supported | Supported | Supported | Supported |  | Supported | Supported | Supported | 
| Asia Pacific (Thailand) |  |  | Supported |  |  | Supported | Supported |  | 
| Asia Pacific (Tokyo) | Supported | Supported | Supported | Supported |  | Supported | Supported | Supported | 
| Canada (Central) |  | Supported | Supported | Supported |  | Supported | Supported | Supported | 
| Europe (Frankfurt) | Supported | Supported | Supported | Supported |  | Supported | Supported | Supported | 
| Europe (Zurich) |  | Supported | Supported | Supported |  | Supported | Supported |  | 
| Europe (Ireland) | Supported | Supported | Supported | Supported | Supported | Supported | Supported | Supported | 
| Europe (London) |  | Supported | Supported | Supported |  | Supported | Supported | Supported | 
| Europe (Milan) |  |  | Supported | Supported |  | Supported | Supported | Supported | 
| Europe (Paris) |  | Supported | Supported | Supported |  | Supported | Supported | Supported | 
| Europe (Spain) | Supported | Supported | Supported | Supported |  | Supported | Supported | Supported | 
| Europe (Stockholm) | Supported | Supported | Supported | Supported |  | Supported | Supported |  | 
| Mexico (Central) |  |  | Supported |  |  | Supported | Supported |  | 
| Middle East (UAE) |  |  | Supported | Supported |  | Supported | Supported | Supported | 
| China (Beijing) |  | Supported | Supported | Supported |  | Supported | Supported | Supported | 
| China (Ningxia) |  |  | Supported | Supported |  | Supported | Supported | Supported | 
| Israel (Tel Aviv) |  |  | Supported | Supported |  | Supported | Supported | Supported | 
| AWS GovCloud (US-West) | Supported | Supported | Supported | Supported |  |  | Supported | Supported | 
| AWS GovCloud (US-East) |  | Supported | Supported | Supported |  | Supported | Supported | Supported | 

## Regions and availability zones
<a name="what-is-regions-and-azs"></a>

Regions and Availability Zones define the physical locations of your cluster and instances.

### Regions
<a name="what-is-regions"></a>

AWS Cloud computing resources are housed in highly available data center facilities in different areas of the world (for example, North America, Europe, or Asia). Each data center location is called a *Region*.

Each AWS Region is designed to be completely isolated from the other AWS Regions. Within each are multiple Availability Zones. By launching your nodes in different Availability Zones, you can achieve the greatest possible fault tolerance. The following diagram shows a high-level view of how AWS Regions and Availability Zones work.

![\[Amazon DocumentDB high-level view of AWS Regions and Availability Zones.\]](http://docs.aws.amazon.com/documentdb/latest/developerguide/images/docdb-regions-and-azs.png)


### Availability zones
<a name="what-is-availability-zones"></a>

Each AWS Region contains multiple distinct locations called *Availability Zones*. Each Availability Zone is engineered to be isolated from failures in other Availability Zones, and to provide inexpensive, low-latency network connectivity to other Availability Zones in the same Region. By launching instances for a given cluster in multiple Availability Zones, you can protect your applications from the unlikely event of an Availability Zone failing.

The Amazon DocumentDB architecture separates storage and compute. For the storage layer, Amazon DocumentDB replicates six copies of your data across three AWS Availability Zones. As an example, if you are launching an Amazon DocumentDB cluster in a Region that only supports two Availability Zones, your data storage will be replicated six ways across three Availability Zones but your compute instances will only be available in two Availability Zones.

 The following table lists the number of Availability Zones that you can use in a given AWS Region to provision compute instances for your cluster.


| Region Name | Region | Availability Zones (compute) | 
| --- | --- | --- | 
| US East (Ohio) | `us-east-2` | 3 | 
| US East (N. Virginia) | `us-east-1` | 6 | 
| US West (Oregon) | `us-west-2` | 4 | 
| Africa (Cape Town) | `af-south-1` | 3 | 
| South America (São Paulo) | `sa-east-1` | 3 | 
| Asia Pacific (Hong Kong) | `ap-east-1` | 3 | 
| Asia Pacific (Hyderabad) | `ap-south-2` | 3 | 
| Asia Pacific (Malaysia) | `ap-southeast-5` | 3 | 
| Asia Pacific (Mumbai) | `ap-south-1` | 3 | 
| Asia Pacific (Osaka) | `ap-northeast-3` | 3 | 
| Asia Pacific (Seoul) | `ap-northeast-2` | 4 | 
| Asia Pacific (Singapore) | `ap-southeast-1` | 3 | 
| Asia Pacific (Sydney) | `ap-southeast-2` | 3 | 
| Asia Pacific (Jakarta) | `ap-southeast-3` | 3 | 
| Asia Pacific (Melbourne) | `ap-southeast-4` | 3 | 
| Asia Pacific (Thailand) | `ap-southeast-7` | 3 | 
| Asia Pacific (Tokyo) | `ap-northeast-1` | 3 | 
| Canada (Central) | `ca-central-1` | 3 | 
| China (Beijing) Region | `cn-north-1` | 3 | 
| China (Ningxia) | `cn-northwest-1` | 3 | 
| Europe (Frankfurt) | `eu-central-1` | 3 | 
| Europe (Zurich) | `eu-central-2` | 3 | 
| Europe (Ireland) | `eu-west-1` | 3 | 
| Europe (London) | `eu-west-2` | 3 | 
| Europe (Milan) | `eu-south-1` | 3 | 
| Europe (Paris) | `eu-west-3` | 3 | 
| Europe (Spain) | `eu-south-2` | 3 | 
| Europe (Stockholm) | `eu-north-1` | 3 | 
| Mexico (Central) | `mx-central-1` | 3 | 
| Middle East (UAE) | `me-central-1` | 3 | 
| Israel (Tel Aviv) | `il-central-1` | 3 | 
| AWS GovCloud (US-West) | `us-gov-west-1` | 3 | 
| AWS GovCloud (US-East) | `us-gov-east-1` | 3 | 

## Amazon DocumentDB Pricing
<a name="docdb-pricing"></a>

Amazon DocumentDB clusters are billed based on the following components: 
+ **Instance hours (per hour)**—Based on the instance class of the instance (for example, `db.r5.xlarge`). Pricing is listed on a per-hour basis, but bills are calculated down to the second and show times in decimal form. Amazon DocumentDB usage is billed in one second increments, with a minimum of 10 minutes. For more information, see [Managing instance classes](db-instance-classes.md). 
+ **I/O requests (per 1 million requests per month)** — Total number of storage I/O requests that you make in a billing cycle.
+ **Backup storage (per GiB per month)** — Backup storage is the storage that is associated with automated database backups and any active database snapshots that you have taken. Increasing your backup retention period or taking additional database snapshots increases the backup storage consumed by your database. Backup storage is metered in GB-months and per second does not apply. For more information, see [Backing up and restoring in Amazon DocumentDB](backup_restore.md). 
+ **Data transfer (per GB)** — Data transfer in and out of your instance from or to the internet or other AWS Regions.

For detailed information, see [Amazon DocumentDB pricing](https://aws.amazon.com/documentdb/pricing/).

### Free trial
<a name="free-trial"></a>

You can try Amazon DocumentDB for free using the 1-month free trial. For more information, see Free trial in [Amazon DocumentDB pricing](https://aws.amazon.com/documentdb/pricing/) or see the [Amazon DocumentDB free trial FAQ](https://aws.amazon.com/documentdb/free-trial/).

## Monitoring
<a name="what-is-monitoring"></a>

There are several ways that you can track the performance and health of an instance. You can use the free Amazon CloudWatch service to monitor the performance and health of an instance. You can find performance charts on the Amazon DocumentDB console. You can subscribe to Amazon DocumentDB events to be notified when changes occur with an instance, snapshot, parameter group, or security group.

For more information, see the following:
+ [Monitoring Amazon DocumentDB with CloudWatch](cloud_watch.md)
+ [Logging Amazon DocumentDB API calls with AWS CloudTrail](logging-with-cloudtrail.md)

## Interfaces
<a name="what-is-interfaces"></a>

There are multiple ways for you to interact with Amazon DocumentDB, including the AWS Management Console and the AWS CLI.

### AWS Management Console
<a name="what-is-console"></a>

The AWS Management Console is a simple web-based user interface. You can manage your clusters and instances from the console with no programming required. To access the Amazon DocumentDB console, sign in to the AWS Management Console and open the Amazon DocumentDB console at [https://console.aws.amazon.com/docdb](https://console.aws.amazon.com/docdb). 

### AWS CLI
<a name="what-is-cli"></a>

You can use the AWS Command Line Interface (AWS CLI) to manage your Amazon DocumentDB clusters and instances. With minimal configuration, you can start using all of the functionality provided by the Amazon DocumentDB console from your favorite terminal program.
+ To install the AWS CLI, see [Installing the AWS Command Line Interface](https://docs.aws.amazon.com/cli/latest/userguide/installing.html).
+ To begin using the AWS CLI for Amazon DocumentDB, see [AWS Command Line Interface Reference for Amazon DocumentDB](https://docs.aws.amazon.com/cli/latest/reference/docdb/index.html).

### MongoDB drivers
<a name="what-is-mongodb-drivers"></a>

For developing and writing applications against an Amazon DocumentDB cluster, you can also use the MongoDB drivers with Amazon DocumentDB. For more information, see the MongoDB shell tab in [Connecting with TLS enabled](connect_programmatically.md#connect_programmatically-tls_enabled) or [Connecting with TLS disabled](connect_programmatically.md#connect_programmatically-tls_disabled).

## What's next?
<a name="what-is-next"></a>

The preceding sections introduced you to the basic infrastructure components that Amazon DocumentDB offers. What should you do next? Depending upon your circumstances, see one of the following topics to get started:
+ Get started with Amazon DocumentDB by creating a cluster and instance using CloudFormation [Amazon DocumentDB quick start using CloudFormation](quick_start_cfn.md).
+ Get started with Amazon DocumentDB by creating a cluster and instance using the instructions in our [Get started guide](get-started-guide.md).
+ Get started with Amazon DocumentDB by creating an elastic cluster using the instructions in [Get started with Amazon DocumentDB elastic clusters](elastic-get-started.md).
+ Migrate your MongoDB implementation to Amazon DocumentDB using the guidance at [Migrating to Amazon DocumentDB](docdb-migration.md)

# Amazon DocumentDB: how it works
<a name="how-it-works"></a>

Amazon DocumentDB (with MongoDB compatibility) is a fully managed, MongoDB-compatible database service. With Amazon DocumentDB, you can run the same application code and use the same drivers and tools that you use with MongoDB. Amazon DocumentDB is compatible with MongoDB 3.6, 4.0, 5.0, and 8.0.

**Topics**
+ [

## Amazon DocumentDB endpoints
](#how-it-works.endpoints)
+ [

## TLS support
](#how-it-works.ssl)
+ [

## Amazon DocumentDB storage
](#how-it-works.storage)
+ [

## Amazon DocumentDB replication
](#how-it-works.replication)
+ [

## Amazon DocumentDB reliability
](#how-it-works.reliability)
+ [

## Read preference options
](#durability-consistency-isolation)
+ [

## TTL deletes
](#how-it-works.ttl-deletes)
+ [Billable resources](#billing)

When you use Amazon DocumentDB, you begin by creating a *cluster*. A cluster consists of zero or more database instances and a cluster volume that manages the data for those instances. An Amazon DocumentDB *cluster volume* is a virtual database storage volume that spans multiple Availability Zones. Each Availability Zone has a copy of the cluster data.

An Amazon DocumentDB cluster consists of two components:
+ **Cluster volume**—Uses a cloud-native storage service to replicate data six ways across three Availability Zones, providing highly durable and available storage. An Amazon DocumentDB cluster has exactly one cluster volume, which can store up to 128 TiB of data.
+ **Instances**—Provide the processing power for the database, writing data to, and reading data from, the cluster storage volume. An Amazon DocumentDB cluster can have 0–16 instances. 

Instances serve one of two roles:
+ **Primary instance**—Supports read and write operations, and performs all the data modifications to the cluster volume. Each Amazon DocumentDB cluster has one primary instance.
+ **Replica instance**—Supports only read operations. An Amazon DocumentDB cluster can have up to 15 replicas in addition to the primary instance. Having multiple replicas enables you to distribute read workloads. In addition, by placing replicas in separate Availability Zones, you also increase your cluster availability.

The following diagram illustrates the relationship between the cluster volume, the primary instance, and replicas in an Amazon DocumentDB cluster:

![\[Amazon DocumentDB endpoints including the cluster, reader, and instance endpoints.\]](http://docs.aws.amazon.com/documentdb/latest/developerguide/images/docdb-endpoint-types.png)


Cluster instances do not need to be of the same instance class, and they can be provisioned and terminated as desired. This architecture lets you scale your cluster’s compute capacity independently of its storage.

When your application writes data to the primary instance, the primary executes a durable write to the cluster volume. It then replicates the state of that write (not the data) to each active replica. Amazon DocumentDB replicas do not participate in processing writes, and thus Amazon DocumentDB replicas are advantageous for read scaling. Reads from Amazon DocumentDB replicas are eventually consistent with minimal replica lag—usually less than 100 milliseconds after the primary instance writes the data. Reads from the replicas are guaranteed to be read in the order in which they were written to the primary. Replica lag varies depending on the rate of data change, and periods of high write activity might increase the replica lag. For more information, see the `ReplicationLag` metrics at [Amazon DocumentDB metrics](cloud_watch.md#cloud_watch-metrics_list). 

## Amazon DocumentDB endpoints
<a name="how-it-works.endpoints"></a>

Amazon DocumentDB provides multiple connection options to serve a wide range of use cases. To connect to an instance in an Amazon DocumentDB cluster, you specify the instance's endpoint. An *endpoint* is a host address and a port number, separated by a colon.

We recommend that you connect to your cluster using the cluster endpoint and in replica set mode (see [Connecting to Amazon DocumentDB as a replica set](connect-to-replica-set.md)) unless you have a specific use case for connecting to the reader endpoint or an instance endpoint. To route requests to your replicas, choose a driver read preference setting that maximizes read scaling while meeting your application's read consistency requirements. The `secondaryPreferred` read preference enables replica reads and frees up the primary instance to do more work.

The following endpoints are available from an Amazon DocumentDB cluster.

### Cluster Endpoint
<a name="how-it-works.endpoints.cluster"></a>

The *cluster endpoint* connects to your cluster’s current primary instance. The cluster endpoint can be used for read and write operations. An Amazon DocumentDB cluster has exactly one cluster endpoint.

The cluster endpoint provides failover support for read and write connections to the cluster. If your cluster’s current primary instance fails, and your cluster has at least one active read replica, the cluster endpoint automatically redirects connection requests to a new primary instance. When connecting to your Amazon DocumentDB cluster, we recommend that you connect to your cluster using the cluster endpoint and in replica set mode (see [Connecting to Amazon DocumentDB as a replica set](connect-to-replica-set.md)).

The following is an example Amazon DocumentDB cluster endpoint:

```
sample-cluster.cluster-123456789012.us-east-1.docdb.amazonaws.com:27017
```

The following is an example connection string using this cluster endpoint:

```
mongodb://username:password@sample-cluster.cluster-123456789012.us-east-1.docdb.amazonaws.com:27017
```

For information about finding a cluster's endpoints, see [Finding a cluster's endpoints](db-cluster-endpoints-find.md).

### Reader endpoint
<a name="how-it-works.endpoints.reader"></a>

The *reader endpoint* load balances read-only connections across all available replicas in your cluster. A cluster reader endpoint will perform as the cluster endpoint if you are connecting through `replicaSet` mode, meaning in the connection string, the replica set parameter is `&replicaSet=rs0`. In this case, you will be able to perform write operations on the primary. However, if you connect to the cluster specifying `directConnection=true`, then attempting to perform a write operation over a connection to the reader endpoint results in an error. An Amazon DocumentDB cluster has exactly one reader endpoint.

If the cluster contains only one (primary) instance, the reader endpoint connects to the primary instance. When you add a replica instance to your Amazon DocumentDB cluster, the reader endpoint opens read-only connections to the new replica after it is active.

The following is an example reader endpoint for an Amazon DocumentDB cluster:

```
sample-cluster.cluster-ro-123456789012.us-east-1.docdb.amazonaws.com:27017
```

The following is an example connection string using a reader endpoint:

```
mongodb://username:password@sample-cluster.cluster-ro-123456789012.us-east-1.docdb.amazonaws.com:27017 
```

The reader endpoint load balances read-only connections, not read requests. If some reader endpoint connections are more heavily used than others, your read requests might not be equally balanced among instances in the cluster. It is recommended to distribute requests by connecting to the cluster endpoint as a replica set and utilizing the secondaryPreferred read preference option. 

For information about finding a cluster's endpoints, see [Finding a cluster's endpoints](db-cluster-endpoints-find.md).

### Instance endpoint
<a name="how-it-works.endpoints.instance"></a>

An *instance endpoint* connects to a specific instance within your cluster. The instance endpoint for the current primary instance can be used for read and write operations. However, attempting to perform write operations to an instance endpoint for a read replica results in an error. An Amazon DocumentDB cluster has one instance endpoint per active instance.

An instance endpoint provides direct control over connections to a specific instance for scenarios in which the cluster endpoint or reader endpoint might not be appropriate. An example use case is provisioning for a periodic read-only analytics workload. You can provision a larger-than-normal replica instance, connect directly to the new larger instance with its instance endpoint, run the analytics queries, and then terminate the instance. Using the instance endpoint keeps the analytics traffic from impacting other cluster instances.

The following is an example instance endpoint for a single instance in an Amazon DocumentDB cluster:

```
sample-instance.123456789012.us-east-1.docdb.amazonaws.com:27017
```

The following is an example connection string using this instance endpoint:

```
mongodb://username:password@sample-instance.123456789012.us-east-1.docdb.amazonaws.com:27017 
```

**Note**  
An instance’s role as primary or replica can change due to a failover event. Your applications should never assume that a particular instance endpoint is the primary instance. We do not recommend connecting to instance endpoints for production applications. Instead, we recommend that you connect to your cluster using the cluster endpoint and in replica set mode (see [Connecting to Amazon DocumentDB as a replica set](connect-to-replica-set.md)). For more advanced control of instance failover priority, see [Understanding Amazon DocumentDB cluster fault tolerance](db-cluster-fault-tolerance.md). 

For information about finding a cluster's endpoints, see [Finding an instance's endpoint](db-instance-endpoint-find.md).

### Replica set mode
<a name="replica-set-mode"></a>

You can connect to your Amazon DocumentDB cluster endpoint in replica set mode by specifying the replica set name `rs0`. Connecting in replica set mode provides the ability to specify the Read Concern, Write Concern, and Read Preference options. For more information, see [Read consistency](#durability-consistency-isolation.read-consistency).

The following is an example connection string connecting in replica set mode:

```
mongodb://username:password@sample-cluster.cluster-123456789012.us-east-1.docdb.amazonaws.com:27017/?replicaSet=rs0
```

When you connect in replica set mode, your Amazon DocumentDB cluster appears to your drivers and clients as a replica set. Instances added and removed from your Amazon DocumentDB cluster are reflected automatically in the replica set configuration.

Each Amazon DocumentDB cluster consists of a single replica set with the default name `rs0`. The replica set name cannot be modified.

Connecting to the cluster endpoint in replica set mode is the recommended method for general use.

**Note**  
All instances in an Amazon DocumentDB cluster listen on the same TCP port for connections.

## TLS support
<a name="how-it-works.ssl"></a>

For more details on connecting to Amazon DocumentDB using Transport Layer Security (TLS), see [Encrypting data in transit](security.encryption.ssl.md).

## Amazon DocumentDB storage
<a name="how-it-works.storage"></a>

Amazon DocumentDB data is stored in a *cluster volume*, which is a single, virtual volume that uses solid state drives (SSDs). A cluster volume consists of six copies of your data, which are replicated automatically across multiple Availability Zones in a single AWS Region. This replication helps ensure that your data is highly durable, with less possibility of data loss. It also helps ensure that your cluster is more available during a failover because copies of your data already exist in other Availability Zones. These copies can continue to serve data requests to the instances in your Amazon DocumentDB cluster. 

### How data storage is billed
<a name="how-it-works-storage-billing"></a>

Amazon DocumentDB automatically increases the size of a cluster volume as the amount of data increases. An Amazon DocumentDB cluster volume can grow to a maximum size of 128 TiB; however, you are only charged for the space that you use in an Amazon DocumentDB cluster volume. Starting with Amazon DocumentDB 4.0, when data is removed, such as by dropping a collection or index, the overall allocated space decreases by a comparable amount. Thus, you can reduce storage charges by deleting collections, indexes, and databases that you no longer need. In Amazon DocumentDB version 3.6, the cluster volume can reuse space that's freed up when you remove data, but the volume itself never decreases in size. As a result in version 3.6, you may not witness any change in storage when you drop a collection or index, even though the freed up space is reused. 

**Note**  
With Amazon DocumentDB 3.6, storage costs are based on the storage "high water mark" (the maximum amount that was allocated for the Amazon DocumentDB cluster at any point in time). You can manage costs by avoiding ETL practices that create large volumes of temporary information, or that load large volumes of new data prior to removing unneeded older data. If removing data from an Amazon DocumentDB cluster results in a substantial amount of allocated but unused space, resetting the high water mark requires doing a logical data dump and restore to a new cluster, using a tool such as `mongodump` or `mongorestore`. Creating and restoring a snapshot does not reduce the allocated storage because the physical layout of the underlying storage remains the same in the restored snapshot.

**Note**  
Using utilities like `mongodump` and `mongorestore` incur I/O charges based on the sizes of the data that is being read and written to the storage volume.

For information about Amazon DocumentDB data storage and I/O pricing, see [Amazon DocumentDB (with MongoDB compatibility) pricing](https://aws.amazon.com/documentdb/pricing) and [Pricing FAQs.](https://aws.amazon.com/documentdb/faqs/#Pricing)

## Amazon DocumentDB replication
<a name="how-it-works.replication"></a>

In an Amazon DocumentDB cluster, each replica instance exposes an independent endpoint. These replica endpoints provide read-only access to the data in the cluster volume. They enable you to scale the read workload for your data over multiple replicated instances. They also help improve the performance of data reads and increase the availability of the data in your Amazon DocumentDB cluster. Amazon DocumentDB replicas are also failover targets and are quickly promoted if the primary instance for your Amazon DocumentDB cluster fails. 

## Amazon DocumentDB reliability
<a name="how-it-works.reliability"></a>

Amazon DocumentDB is designed to be reliable, durable, and fault tolerant. (To improve availability, you should configure your Amazon DocumentDB cluster so that it has multiple replica instances in different Availability Zones.) Amazon DocumentDB includes several automatic features that make it a reliable database solution. 

### Storage auto-repair
<a name="how-it-works.reliability.storage-auto-repair"></a>

Amazon DocumentDB maintains multiple copies of your data in three Availability Zones, greatly reducing the chance of losing data due to a storage failure. Amazon DocumentDB automatically detects failures in the cluster volume. When a segment of a cluster volume fails, Amazon DocumentDB immediately repairs the segment. It uses the data from the other volumes that make up the cluster volume to help ensure that the data in the repaired segment is current. As a result, Amazon DocumentDB avoids data loss and reduces the need to perform a point-in-time restore to recover from an instance failure. 

### Survivable cache warming
<a name="how-it-works.reliability.survivable-cache-warming"></a>

Amazon DocumentDB manages its page cache in a separate process from the database so that the page cache can survive independently of the database. In the unlikely event of a database failure, the page cache remains in memory. This ensures that the buffer pool is warmed with the most current state when the database restarts.

### Crash recovery
<a name="how-it-works.reliability.crash-recovery"></a>

Amazon DocumentDB is designed to recover from a crash almost instantaneously, and to continue serving your application data. Amazon DocumentDB performs crash recovery asynchronously on parallel threads so that your database is open and available almost immediately after a crash. 

### Resource governance
<a name="how-it-works.reliability.resource-governance"></a>

Amazon DocumentDB safeguards resources that are needed to run critical processes in the service, such as health checks. To do this, and when an instance is experiencing high memory pressure, Amazon DocumentDB will throttle requests. As a result, some operations may be queued to wait for the memory pressure to subside. If memory pressure continues, queued operations may timeout. You can monitor whether or not the service throttling operations due to low memory with the following CloudWatch metrics: `LowMemThrottleQueueDepth`, `LowMemThrottleMaxQueueDepth`, `LowMemNumOperationsThrottled`, `LowMemNumOperationsTimedOut`. For more information, see Monitoring Amazon DocumentDB with CloudWatch. If you see sustained memory pressure on your instance as a result of the LowMem CloudWatch metrics, we advise that you scale-up your instance to provide additional memory for your workload.

## Read preference options
<a name="durability-consistency-isolation"></a>

Amazon DocumentDB uses a cloud-native shared storage service that replicates data six times across three Availability Zones to provide high levels of durability. Amazon DocumentDB does not rely on replicating data to multiple instances to achieve durability. Your cluster’s data is durable whether it contains a single instance or 15 instances.

**Topics**
+ [

### Write durability
](#durability-consistency-isolation.write-durability)
+ [

### Read isolation
](#durability-consistency-isolation.read-isolation)
+ [

### Read consistency
](#durability-consistency-isolation.read-consistency)
+ [

### High availability
](#durability-consistency-isolation.high-availability)
+ [

### Scaling reads
](#durability-consistency-isolation.scaling-reads)

### Write durability
<a name="durability-consistency-isolation.write-durability"></a>

Amazon DocumentDB uses a unique, distributed, fault-tolerant, self-healing storage system. This system replicates six copies (V=6) of your data across three AWS Availability Zones to provide high availability and durability. When writing data, Amazon DocumentDB ensures that all writes are durably recorded on a majority of nodes before acknowledging the write to the client. If you are running a three-node MongoDB replica set, using a write concern of `{w:3, j:true}` would yield the best possible configuration when comparing with Amazon DocumentDB.

Writes to an Amazon DocumentDB cluster must be processed by the cluster’s writer instance. Attempting to write to a reader results in an error. An acknowledged write from an Amazon DocumentDB primary instance is durable, and can't be rolled back. Amazon DocumentDB is highly durable by default and doesn't support a non-durable write option. You can't modify the durability level (that is, write concern). Amazon DocumentDB ignores w=anything and is effectively w: 3 and j: true. You cannot reduce it.

Because storage and compute are separated in the Amazon DocumentDB architecture, a cluster with a single instance is highly durable. Durability is handled at the storage layer. As a result, an Amazon DocumentDB cluster with a single instance and one with three instances achieve the same level of durability. You can configure your cluster to your specific use case while still providing high durability for your data.

Writes to an Amazon DocumentDB cluster are atomic within a single document. 

Amazon DocumentDB does not support the `wtimeout` option and will not return an error if a value is specified. Writes to the primary Amazon DocumentDB instance are guaranteed not to block indefinitely.

### Read isolation
<a name="durability-consistency-isolation.read-isolation"></a>

Reads from an Amazon DocumentDB instance only return data that is durable before the query begins. Reads never return data modified after the query begins execution nor are dirty reads possible under any circumstances.

### Read consistency
<a name="durability-consistency-isolation.read-consistency"></a>

Data read from an Amazon DocumentDB cluster is durable and will not be rolled back. You can modify the read consistency for Amazon DocumentDB reads by specifying the read preference for the request or connection. Amazon DocumentDB does not support a non-durable read option.

Reads from an Amazon DocumentDB cluster’s primary instance are strongly consistent under normal operating conditions and have read-after-write consistency. If a failover event occurs between the write and subsequent read, the system can briefly return a read that is not strongly consistent. All reads from a read replica are eventually consistent and return the data in the same order, and often with less than 100 ms replica lag.

#### Amazon DocumentDB read preferences
<a name="durability-consistency-isolation.read-preferences"></a>

Amazon DocumentDB supports setting a read preference option only when reading data from the cluster endpoint in replica set mode. Setting a read preference option affects how your MongoDB client or driver routes read requests to instances in your Amazon DocumentDB cluster. You can set read preference options for a specific query, or as a general option in your MongoDB driver. (Consult your client or driver’s documentation for instructions on how to set a read preference option.)

If your client or driver is not connecting to an Amazon DocumentDB cluster endpoint in replica set mode, the result of specifying a read preference is undefined.

Amazon DocumentDB does not support setting *tag sets* as a read preference.

**Supported Read Preference Options**
+ **`primary`**—Specifying a `primary` read preference helps ensure that all reads are routed to the cluster’s primary instance. If the primary instance is unavailable, the read operation fails. A `primary` read preference yields read-after-write consistency and is appropriate for use cases that prioritize read-after-write consistency over high availability and read scaling.

  The following example specifies a `primary` read preference:

  ```
  db.example.find().readPref('primary')
  ```

   
+ **`primaryPreferred`**—Specifying a `primaryPreferred` read preference routes reads to the primary instance under normal operation. If there is a primary failover, the client routes requests to a replica. A `primaryPreferred` read preference yields read-after-write consistency during normal operation, and eventually consistent reads during a failover event. A `primaryPreferred` read preference is appropriate for use cases that prioritize read-after-write consistency over read scaling, but still require high availability.

  The following example specifies a `primaryPreferred` read preference:

  ```
  db.example.find().readPref('primaryPreferred')
  ```

   
+ **`secondary`**—Specifying a `secondary` read preference ensures that reads are only routed to a replica, never the primary instance. If there are no replica instances in a cluster, the read request fails. A `secondary` read preference yields eventually consistent reads and is appropriate for use cases that prioritize primary instance write throughput over high availability and read-after-write consistency.

  The following example specifies a `secondary` read preference:

  ```
  db.example.find().readPref('secondary')
  ```

   
+ **`secondaryPreferred`**—Specifying a `secondaryPreferred` read preference ensures that reads are routed to a read replica when one or more replicas are active. If there are no active replica instances in a cluster, the read request is routed to the primary instance. A `secondaryPreferred` read preference yields eventually consistent reads when the read is serviced by a read replica. It yields read-after-write consistency when the read is serviced by the primary instance (barring failover events). A `secondaryPreferred` read preference is appropriate for use cases that prioritize read scaling and high availability over read-after-write consistency.

  The following example specifies a `secondaryPreferred` read preference:

  ```
  db.example.find().readPref('secondaryPreferred')
  ```

   
+ **`nearest`**—Specifying a `nearest` read preference routes reads based solely on the measured latency between the client and all instances in the Amazon DocumentDB cluster. A `nearest` read preference yields eventually consistent reads when the read is serviced by a read replica. It yields read-after-write consistency when the read is serviced by the primary instance (barring failover events). A `nearest` read preference is appropriate for use cases that prioritize achieving the lowest possible read latency and high availability over read-after-write consistency and read scaling.

  The following example specifies a `nearest` read preference:

  ```
  db.example.find().readPref('nearest')
  ```

### High availability
<a name="durability-consistency-isolation.high-availability"></a>

Amazon DocumentDB supports highly available cluster configurations by using replicas as failover targets for the primary instance. If the primary instance fails, an Amazon DocumentDB replica is promoted as the new primary, with a brief interruption during which read and write requests made to the primary instance fail with an exception.

If your Amazon DocumentDB cluster doesn't include any replicas, the primary instance is re-created during a failure. However, promoting an Amazon DocumentDB replica is much faster than re-creating the primary instance. So we recommend that you create one or more Amazon DocumentDB replicas as failover targets.

Replicas that are intended for use as failover targets should be of the same instance class as the primary instance. They should be provisioned in different Availability Zones from the primary. You can control which replicas are preferred as failover targets. For best practices on configuring Amazon DocumentDB for high availability, see [Understanding Amazon DocumentDB cluster fault tolerance](db-cluster-fault-tolerance.md).

### Scaling reads
<a name="durability-consistency-isolation.scaling-reads"></a>

Amazon DocumentDB replicas are ideal for read scaling. They are fully dedicated to read operations on your cluster volume, that is, replicas do not process writes. Data replication happens within the cluster volume and not between instances. So each replica’s resources are dedicated to processing your queries, not replicating and writing data.

If your application needs more read capacity, you can add a replica to your cluster quickly (usually in less than ten minutes). If your read capacity requirements diminish, you can remove unneeded replicas. With Amazon DocumentDB replicas, you pay only for the read capacity that you need.

Amazon DocumentDB supports client-side read scaling through the use of Read Preference options. For more information, see [Amazon DocumentDB read preferences](#durability-consistency-isolation.read-preferences).

## TTL deletes
<a name="how-it-works.ttl-deletes"></a>

Deletes from a TTL index area achieved via a background process are best effort and are not guaranteed within a specific timeframe. Factors like instance size, instance resource utilization, document size, and overall throughput can affect the timing of a TTL delete.

When the TTL monitor deletes your documents, each deletion incurs IO costs, which will increase your bill. If throughput and TTL delete rates increase, you should expect an increase in your bill due to increased IO usage.

When you create a TTL index on an existing collection, you must delete all expired documents before creating the index. The current TTL implementation is optimized for deleting a small fraction of documents in the collection, which is typical if TTL was enabled on the collection from the start, and may result in higher IOPS than necessary if a large number of documents need to be deleted at one go.

If you do not want to create a TTL index to delete documents, you can instead segment documents into collections based on time, and simply drop those collections when the documents are no longer needed. For example: you can create one collection per week and drop it without incurring IO costs. This can be significantly more cost effective than using a TTL index.

## Billable resources
<a name="billing"></a>

### Identifying billable Amazon DocumentDB resources
<a name="billing.identifying-billable-resources"></a>

As a fully managed database service, Amazon DocumentDB charges for instances, storage, I/Os, backups, and data transfer. For more information, see [Amazon DocumentDB (with MongoDB compatibility) pricing](https://aws.amazon.com/documentdb/pricing/). 

To discover billable resources in your account and potentially delete the resources, you can use the AWS Management Console or AWS CLI.

#### Using the AWS Management Console
<a name="billing.identifying-billable-resources-con"></a>

Using the AWS Management Console, you can discover the Amazon DocumentDB clusters, instances, and snapshots that you have provisioned for a given AWS Region.

**To discover clusters, instances, and snapshots**

1. Sign in to the AWS Management Console, and open the Amazon DocumentDB console at [https://console.aws.amazon.com/docdb](https://console.aws.amazon.com/docdb).

1. To discover billable resources in a Region other than your default Region, in the upper-right corner of the screen, choose the AWS Region that you want to search.   
![\[The North Virginia Region in the region selector.\]](http://docs.aws.amazon.com/documentdb/latest/developerguide/images/db-cluster-console-region.png)

1. In the navigation pane, choose the type of billable resource that you're interested in: **Clusters**, **Instances**, or **Snapshots**.  
![\[Clusters, instances, and snapshots in the navigation pane.\]](http://docs.aws.amazon.com/documentdb/latest/developerguide/images/db-navigation-pane-clusters-instances-snapshots.png)

1. All your provisioned clusters, instances, or snapshots for the Region are listed in the right pane. You will be charged for clusters, instances, and snapshots.

#### Using the AWS CLI
<a name="billing.identifying-billable-resources-cli"></a>

Using the AWS CLI, you can discover the Amazon DocumentDB clusters, instances, and snapshots that you have provisioned for a given AWS Region.

**To discover clusters and instances**  
The following code lists all your clusters and instances for the specified Region. If you want to search for clusters and instances in your default Region, you can omit the `--region` parameter.

**Example**  
For Linux, macOS, or Unix:  

```
aws docdb describe-db-clusters \
    --region us-east-1 \
    --query 'DBClusters[?Engine==`docdb`]' | \
       grep -e "DBClusterIdentifier" -e "DBInstanceIdentifier"
```
For Windows:  

```
aws docdb describe-db-clusters ^
    --region us-east-1 ^
    --query 'DBClusters[?Engine==`docdb`]' | ^
       grep -e "DBClusterIdentifier" -e "DBInstanceIdentifier"
```
Output from this operation looks something like the following.  

```
"DBClusterIdentifier": "docdb-2019-01-09-23-55-38",
        "DBInstanceIdentifier": "docdb-2019-01-09-23-55-38",
        "DBInstanceIdentifier": "docdb-2019-01-09-23-55-382",
"DBClusterIdentifier": "sample-cluster",
"DBClusterIdentifier": "sample-cluster2",
```
**To discover snapshots**  
The following code lists all your snapshots for the specified Region. If you want to search for snapshots in your default Region, you can omit the `--region` parameter.
For Linux, macOS, or Unix:  

```
aws docdb describe-db-cluster-snapshots \
  --region us-east-1 \
  --query 'DBClusterSnapshots[?Engine==`docdb`].[DBClusterSnapshotIdentifier,SnapshotType]'
```
For Windows:  

```
aws docdb describe-db-cluster-snapshots ^
  --region us-east-1 ^
  --query 'DBClusterSnapshots[?Engine==`docdb`].[DBClusterSnapshotIdentifier,SnapshotType]'
```
Output from this operation looks something like the following.  

```
[
    [
        "rds:docdb-2019-01-09-23-55-38-2019-02-13-00-06",
        "automated"
    ],
    [
        "test-snap",
        "manual"
    ]
]
```
You only need to delete `manual` snapshots. `Automated` snapshots are deleted when you delete the cluster.

### Deleting unwanted billable resources
<a name="billing.deleting-billable-resources"></a>

To delete a cluster, you must first delete all the instances in the cluster.
+ To delete instances, see [Deleting an Amazon DocumentDB instance](db-instance-delete.md). 
**Important**  
Even if you delete the instances in a cluster, you are still billed for the storage and backup usage associated with that cluster. To stop all charges, you must also delete your cluster and manual snapshots.
+ To delete clusters, see [Deleting an Amazon DocumentDB cluster](db-cluster-delete.md). 
+ To delete manual snapshots, see [Deleting a cluster snapshot](backup_restore-delete_cluster_snapshot.md). 

# What is a document database?
<a name="what-is-document-db"></a>

Some developers don't think of their data model in terms of normalized rows and columns. Typically, in the application tier, data is represented as a JSON document because it is more intuitive for developers to think of their data model as a document. 

The popularity of document databases has grown because they let you persist data in a database by using the same document model format that you use in your application code. Document databases provide powerful and intuitive APIs for flexible and agile development.

**Topics**
+ [Use cases](document-database-use-cases.md)
+ [Understanding documents](document-database-documents-understanding.md)
+ [

# Working with documents
](document-database-working-with-documents.md)

# Document database use Cases
<a name="document-database-use-cases"></a>

Your use case drives whether you need a document database or some other type of database for managing your data. Document databases are useful for workloads that require a flexible schema for fast, iterative development. The following are some examples of use cases for which document databases can provide significant advantages:

**Topics**
+ [

## User profiles
](#document-databases-use-cases.user-profiles)
+ [

## Real-time big data
](#document-databases-use-cases.big-data)
+ [

## Content management
](#document-databases-use-cases.content-management)

## User profiles
<a name="document-databases-use-cases.user-profiles"></a>

Because document databases have a flexible schema, they can store documents that have different attributes and data values. Document databases are a practical solution to online profiles in which different users provide different types of information. Using a document database, you can store each user's profile efficiently by storing only the attributes that are specific to each user.

Suppose that a user elects to add or remove information from their profile. In this case, their document could be easily replaced with an updated version that contains any recently added attributes and data or omits any newly omitted attributes and data. Document databases easily manage this level of individuality and fluidity.

## Real-time big data
<a name="document-databases-use-cases.big-data"></a>

Historically, the ability to extract information from operational data was hampered by the fact that operational databases and analytical databases were maintained in different environments—operational and business/reporting respectively. Being able to extract operational information in real time is critical in a highly competitive business environment. By using document databases, a business can store and manage operational data from any source and concurrently feed the data to the BI engine of choice for analysis. There is no requirement to have two environments.

## Content management
<a name="document-databases-use-cases.content-management"></a>

To effectively manage content, you must be able to collect and aggregate content from a variety of sources, and then deliver it to the customer. Due to their flexible schema, document databases are perfect for collecting and storing any type of data. You can use them to create and incorporate new types of content, including user-generated content, such as images, comments, and videos.

# Understanding documents
<a name="document-database-documents-understanding"></a>

Document databases are used for storing semistructured data as a document—rather than normalizing data across multiple tables, each with a unique and fixed structure, as in a relational database. Documents stored in a document database use nested key-value pairs to provide the document's structure or schema. However, different types of documents can be stored in the same document database, thus meeting the requirement for processing similar data that is in different formats. For example, because each document is self-describing, the JSON-encoded documents for an online store that are described in the topic [Example documents in a document database](#document-database-documents) can be stored in the same document database. 

**Topics**
+ [

## SQL vs. non-relational terminology
](#document-database-sql-vs-nosql-terms)
+ [

## Simple documents
](#document-database-documents-simple)
+ [

## Embedded documents
](#document-database-documents-embeded)
+ [

## Example documents in a document database
](#document-database-documents)
+ [

## Understanding normalization in a document database
](#document-database-normalization)

## SQL vs. non-relational terminology
<a name="document-database-sql-vs-nosql-terms"></a>

The following table compares terminology used by document databases (MongoDB) with terminology used by SQL databases.


|  SQL  |  MongoDB  | 
| --- | --- | 
|  Table  |  Collection  | 
|  Row  |  Document  | 
|  Column  |  Field  | 
|  Primary key  |  ObjectId  | 
|  Index  |  Index  | 
|  View  |  View  | 
|  Nested table or object  |  Embedded document  | 
|  Array  |  Array  | 

## Simple documents
<a name="document-database-documents-simple"></a>

All documents in a document database are self-describing. This documentation uses JSON-like formatted documents, although you can use other means of encoding.

A simple document has one or more fields that are all at the same level within the document. In the following example, the fields `SSN`, `LName`, `FName`, `DOB`, `Street`, `City`, `State-Province`, `PostalCode`, and `Country` are all siblings within the document.

```
{
   "SSN": "123-45-6789",
   "LName": "Rivera",
   "FName": "Martha",
   "DOB": "1992-11-16",
   "Street": "125 Main St.",
   "City": "Anytown",
   "State-Province": "WA",
   "PostalCode": "98117",
   "Country": "USA"
}
```

When information is organized in a simple document, each field is managed individually. To retrieve a person's address, you must retrieve `Street`, `City`, `State-Province`, `PostalCode`, and `Country` as individual data items.

## Embedded documents
<a name="document-database-documents-embeded"></a>

A complex document organizes its data by creating embedded documents within the document. Embedded documents help manage data in groupings and as individual data items, whichever is more efficient in a given case. Using the preceding example, you could embed an `Address` document in the main document. Doing this results in the following document structure:

```
{
   "SSN": "123-45-6789",
   "LName": "Rivera",
   "FName": "Martha",
   "DOB": "1992-11-16",
   "Address": 
   {
       "Street": "125 Main St.",
       "City": "Anytown",
       "State-Province": "WA",
       "PostalCode": "98117",
       "Country": "USA" 
   }
}
```

You can now access the data in the document as individual fields ( `"SSN":` ), as an embedded document ( `"Address":` ), or as a member of an embedded document ( `"Address":{"Street":}` ).

## Example documents in a document database
<a name="document-database-documents"></a>

As stated earlier, because each document in a document database is self-describing, the structure of documents within a document database can be different from one another. The following two documents, one for a book and another for a periodical, are different structurally. Yet both of them can be in the same document database.

The following is a sample book document:

```
{
    "_id" : "9876543210123",
    "Type": "book",
    "ISBN": "987-6-543-21012-3",
    "Author": 
    {
        "LName":"Roe",
        "MI": "T",
        "FName": "Richard" 
    },
    "Title": "Understanding Document Databases"
}
```

The following is a sample periodical document with two articles:

```
{
    "_id" : "0123456789012",
    "Publication": "Programming Today",
    "Issue": 
    {
        "Volume": "14",
        "Number": "09"
    },
    "Articles" : [ 
        {
            "Title": "Is a Document Database Your Best Solution?",
            "Author": 
            {
                "LName": "Major",
                "FName": "Mary" 
            }
        },
        {
            "Title": "Databases for Online Solutions",
            "Author": 
            {
                "LName": "Stiles",
                "FName": "John" 
            }
        }
    ],
    "Type": "periodical"
}
```

Compare the structure of these two documents. With a relational database, you need either separate "periodical" and "books" tables, or a single table with unused fields, such as "Publication," "Issue," "Articles," and "MI," as `null` values. Because document databases are semistructured, with each document defining its own structure, these two documents can coexist in the same document database with no `null` fields. Document databases are good at dealing with sparse data.

Developing against a document database enables quick, iterative development. This is because you can change the data structure of a document dynamically, without having to change the schema for the entire collection. Document databases are well suited for agile development and dynamically changing environments.

## Understanding normalization in a document database
<a name="document-database-normalization"></a>

Document databases are not normalized; data found in one document can be repeated in another document. Further, some data discrepancies can exist between documents. For example, consider the scenario in which you make a purchase at an online store and all the details of your purchases are stored in a single document. The document might look something like the following JSON document:

```
{
    "DateTime": "2018-08-15T12:13:10Z",
    "LName" : "Santos",
    "FName" : "Paul",
    "Cart" : [ 
        {
            "ItemId" : "9876543210123",
            "Description" : "Understanding Document Databases",
            "Price" : "29.95"
        },
        {
            "ItemId" : "0123456789012",
            "Description" : "Programming Today",
            "Issue": {
                "Volume": "14",
                "Number": "09"
            },
            "Price" : "8.95"
        },
        {
            "ItemId": "234567890-K",
            "Description": "Gel Pen (black)",
            "Price": "2.49" 
        }
    ],
    "PaymentMethod" : 
    {
        "Issuer" : "MasterCard",
        "Number" : "1234-5678-9012-3456" 
    },
    "ShopperId" : "1234567890" 
}
```

All this information is stored as a document in a transaction collection. Later, you realize that you forgot to purchase one item. So you again log on to the same store and make another purchase, which is also stored as another document in the transaction collection.

```
{
    "DateTime": "2018-08-15T14:49:00Z",
    "LName" : "Santos",
    "FName" : "Paul",
    "Cart" : [ 
        {
            "ItemId" : "2109876543210",
            "Description" : "Document Databases for Fun and Profit",
            "Price" : "45.95"
        } 
    ],
    "PaymentMethod" : 
    {
        "Issuer" : "Visa",
        "Number" : "0987-6543-2109-8765" 
    },
    "ShopperId" : "1234567890" 
}
```

Notice the redundancy between these two documents—your name and shopper ID (and, if you used the same credit card, your credit card information). But that's okay because storage is inexpensive, and each document completely records a single transaction that can be retrieved quickly with a simple key-value query that requires no joins.

There is also an apparent discrepancy between the two documents—your credit card information. This is only an apparent discrepancy because it is likely that you used a different credit card for each purchase. Each document is accurate for the transaction that it documents.

# Working with documents
<a name="document-database-working-with-documents"></a>

As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data. In Amazon DocumentDB, a collection is analogous to a table in a relational database, except there is no single schema enforced upon all documents. Collections let you group similar documents together while keeping them all in the same database, without requiring that they be identical in structure.

Using the example documents from earlier sections, it is likely that you'd have collections for `reading_material` and `office_supplies`. It is the responsibility of your software to enforce which collection a document belongs in.

The following examples use the MongoDB API to show how to add, query, update, and delete documents.

**Topics**
+ [

## Adding documents
](#document-database-adding-documents)
+ [

## Querying documents
](#document-database-queries)
+ [

## Updating documents
](#document-database-updating)
+ [

## Deleting documents
](#document-database-deleting)

## Adding documents
<a name="document-database-adding-documents"></a>

In Amazon DocumentDB, a database is created when first you add a document to a collection. In this example, you are creating a collection named `example` in the `test` database, which is the default database when you connect to a cluster. Because the collection is implicitly created when the first document is inserted, there is no error checking of the collection name. Therefore, a typo in the collection name, such as `eexample` instead of `example`, will create and add the document to `eexample` collection rather than the intended collection. Error checking must be handled by your application.

The following examples use the MongoDB API to add documents.

**Topics**
+ [

### Adding a single document
](#document-database-adding-documents-single)
+ [

### Adding multiple documents
](#document-database-adding-documents-multiple)

### Adding a single document
<a name="document-database-adding-documents-single"></a>

To add a single document to a collection, use the `insertOne( {} )` operation with the document that you want added to the collection.

```
db.example.insertOne(
    {
        "Item": "Ruler",
        "Colors": ["Red","Green","Blue","Clear","Yellow"],
        "Inventory": {
            "OnHand": 47,
            "MinOnHand": 40
        },
        "UnitPrice": 0.89
    }
)
```

Output from this operation looks something like the following (JSON format).

```
{
    "acknowledged" : true,
    "insertedId" : ObjectId("5bedafbcf65ff161707de24f")
}
```

### Adding multiple documents
<a name="document-database-adding-documents-multiple"></a>

To add multiple documents to a collection, use the `insertMany( [{},...,{}] )` operation with a list of the documents that you want added to the collection. Although the documents in this particular list have different schemas, they can all be added to the same collection.

```
db.example.insertMany(
    [
        {
            "Item": "Pen",
            "Colors": ["Red","Green","Blue","Black"],
            "Inventory": {
                "OnHand": 244,
                "MinOnHand": 72 
            }
        },
        {
            "Item": "Poster Paint",
            "Colors": ["Red","Green","Blue","Black","White"],
            "Inventory": {
                "OnHand": 47,
                "MinOnHand": 50 
            }
        },
        {
            "Item": "Spray Paint",
            "Colors": ["Black","Red","Green","Blue"],
            "Inventory": {
                "OnHand": 47,
                "MinOnHand": 50,
                "OrderQnty": 36
            }
        }    
    ]
)
```

Output from this operation looks something like the following (JSON format).

```
{
    "acknowledged" : true,
    "insertedIds" : [
            ObjectId("5bedb07941ca8d9198f5934c"),
            ObjectId("5bedb07941ca8d9198f5934d"),
            ObjectId("5bedb07941ca8d9198f5934e")
    ]
}
```

## Querying documents
<a name="document-database-queries"></a>

At times, you might need to look up your online store's inventory so that customers can see and purchase what you're selling. Querying a collection is relatively easy, whether you want all documents in the collection or only those documents that satisfy a particular criterion.

To query for documents, use the `find()` operation. The `find()` command has a single document parameter that defines the criteria to use in choosing the documents to return. The output from `find()` is a document formatted as a single line of text with no line breaks. To format the output document for easier reading, use `find().pretty()`. All the examples in this topic use `.pretty()` to format the output.

Use the four documents you inserted into the `example` collection in the preceding two exercises — `insertOne()` and `insertMany()`.

**Topics**
+ [

### Retrieving all documents in a collection
](#document-database-queries-all-documents)
+ [

### Retrieving documents that match a field value
](#document-database-queries-match-criteria)
+ [

### Retrieving documents that match an embedded document
](#document-database-queries-entire-embedded-document)
+ [

### Retrieving documents that match a field value in an embedded document
](#document-database-queries-embeded-document-field)
+ [

### Retrieving documents that match an array
](#document-database-queries-array-match)
+ [

### Retrieving documents that match a value in an array
](#document-database-queries-array-value-match)
+ [

### Retrieving documents using operators
](#document-database-query-operators)

### Retrieving all documents in a collection
<a name="document-database-queries-all-documents"></a>

To retrieve all the documents in your collection, use the `find()` operation with an empty query document.

The following query returns all documents in the `example` collection.

```
db.example.find( {} ).pretty()
```

### Retrieving documents that match a field value
<a name="document-database-queries-match-criteria"></a>

To retrieve all documents that match a field and value, use the `find()` operation with a query document that identifies the fields and values to match.

Using the preceding documents, this query returns all documents where the "Item" field equals "Pen".

```
db.example.find( { "Item": "Pen" } ).pretty()
```

### Retrieving documents that match an embedded document
<a name="document-database-queries-entire-embedded-document"></a>

To find all the documents that match an embedded document, use the `find()` operation with a query document that specifies the embedded document name and all the fields and values for that embedded document.

When matching an embedded document, the document's embedded document must have the same name as in the query. In addition, the fields and values in the embedded document must match the query.

The following query returns only the "Poster Paint" document. This is because the "Pen" has different values for "`OnHand`" and "`MinOnHand`", and "Spray Paint" has one more field (`OrderQnty`) than the query document.

```
db.example.find({"Inventory": {
    "OnHand": 47,
    "MinOnHand": 50 } } ).pretty()
```

### Retrieving documents that match a field value in an embedded document
<a name="document-database-queries-embeded-document-field"></a>

To find all the documents that match an embedded document, use the `find()` operation with a query document that specifies the embedded document name and all the fields and values for that embedded document.

Given the preceding documents, the following query uses "dot notation" to specify the embedded document and fields of interest. Any document that matches these are returned, regardless of what other fields might be present in the embedded document. The query returns "Poster Paint" and "Spray Paint" because they both match the specified fields and values.

```
db.example.find({"Inventory.OnHand": 47, "Inventory.MinOnHand": 50 }).pretty()
```

### Retrieving documents that match an array
<a name="document-database-queries-array-match"></a>

To find all documents that match an array, use the `find()` operation with the array name that you are interested in and all the values in that array. The query returns all documents that have an array with that name in which the array values are identical to and in the same order as in the query.

The following query returns only the "Pen" because the "Poster Paint" has an additional color (White), and "Spray Paint" has the colors in a different order.

```
db.example.find( { "Colors": ["Red","Green","Blue","Black"] } ).pretty() 
```

### Retrieving documents that match a value in an array
<a name="document-database-queries-array-value-match"></a>

To find all the documents that have a particular array value, use the `find()` operation with the array name and the value that you're interested in.

```
db.example.find( { "Colors": "Red" } ).pretty() 
```

The preceding operation returns all three documents because each of them has an array named `Colors` and the value "`Red`" somewhere in the array. If you specify the value "`White`," the query would only return "Poster Paint."

### Retrieving documents using operators
<a name="document-database-query-operators"></a>

The following query returns all documents where the "`Inventory.OnHand`" value is less than 50.

```
db.example.find(
        { "Inventory.OnHand": { $lt: 50 } } )
```

For a listing of supported query operators, see [Query and projection operators](mongo-apis.md#mongo-apis-query). 

## Updating documents
<a name="document-database-updating"></a>

Typically, your documents are not static and are updated as part of your application workflows. The following examples show some of the ways that you can update documents.

To update an existing document, use the `update()` operation. The `update()` operation has two document parameters. The first document identifies which document or documents to update. The second document specifies the updates to make.

When you update an existing field — whether that field is a simple field, an array, or an embedded document — you specify the field name and its values. At the end of the operation, it is as though the field in the old document has been replaced by the new field and values.

**Topics**
+ [

### Updating the values of an existing field
](#document-database-updating-existing-fields)
+ [

### Adding a new field
](#document-database-updating-adding-field)
+ [

### Replacing an embedded document
](#document-database-replacing-embedded-document)
+ [

### Inserting new fields into an embedded document
](#document-database-updating-adding-field-embedded)
+ [

### Removing a field from a document
](#document-database-remove-field)
+ [

### Removing a field from multiple documents
](#document-database-remove-field-all)

### Updating the values of an existing field
<a name="document-database-updating-existing-fields"></a>

Use the following four documents that you added earlier for the following updating operations.

```
{
    "Item": "Ruler",
    "Colors": ["Red","Green","Blue","Clear","Yellow"],
    "Inventory": {
        "OnHand": 47,
        "MinOnHand": 40
    },
    "UnitPrice": 0.89
},
{
    "Item": "Pen",
    "Colors": ["Red","Green","Blue","Black"],
    "Inventory": {
        "OnHand": 244,
        "MinOnHand": 72 
    }
},
{
    "Item": "Poster Paint",
    "Colors": ["Red","Green","Blue","Black","White"],
    "Inventory": {
        "OnHand": 47,
        "MinOnHand": 50 
    }
},
{
    "Item": "Spray Paint",
    "Colors": ["Black","Red","Green","Blue"],
    "Inventory": {
        "OnHand": 47,
        "MinOnHand": 50,
        "OrderQnty": 36
    }
}
```

**To update a simple field**  
To update a simple field, use `update()` with `$set` to specify the field name and new value. The following example changes the `Item` from "Pen" to "Gel Pen".

```
db.example.update(
    { "Item" : "Pen" },
    { $set: { "Item": "Gel Pen" } }
)
```

Results from this operation look something like the following.

```
{
    "Item": "Gel Pen",
    "Colors": ["Red","Green","Blue","Black"],
    "Inventory": {
        "OnHand": 244,
        "MinOnHand": 72 
    }
}
```

**To update an array**  
The following example replaces the existing array of colors with a new array that includes `Orange` and drops `White` from the list of colors. The new list of colors is in the order specified in the `update()` operation.

```
db.example.update(
    { "Item" : "Poster Paint" },
    { $set: { "Colors": ["Red","Green","Blue","Orange","Black"] } }
)
```

Results from this operation look something like the following.

```
{
    "Item": "Poster Paint",
    "Colors": ["Red","Green","Blue","Orange","Black"],
    "Inventory": {
        "OnHand": 47,
        "MinOnHand": 50 
    }
}
```

### Adding a new field
<a name="document-database-updating-adding-field"></a>

To modify a document by adding one or more new fields, use the `update()` operation with a query document that identifies the document to insert into and the new fields and values to insert using the `$set` operator.

The following example adds the field `UnitPrice` with the value `3.99` to the Spray Paints document. Note that the value `3.99` is numeric and not a string.

```
db.example.update(
    { "Item": "Spray Paint" },
    { $set: { "UnitPrice": 3.99 } } 
)
```

Results from this operation look something like the following (JSON format).

```
{
    "Item": "Spray Paint",
    "Colors": ["Black","Red","Green","Blue"],
    "Inventory": {
        "OnHand": 47,
        "MinOnHand": 50,
        "OrderQnty": 36
    },
    "UnitPrice": 3.99
}
```

### Replacing an embedded document
<a name="document-database-replacing-embedded-document"></a>

To modify a document by replacing an embedded document, use the `update()` operation with documents that identify the embedded document and its new fields and values using the `$set` operator.

Given the following document.

```
db.example.insert({
    "DocName": "Document 1",
    "Date": {
        "Year": 1987,
        "Month": 4,
        "Day": 18
    }
})
```

**To replace an embedded document**  
The following example replaces the current Date document with a new one that has only the fields `Month` and `Day`; `Year` has been eliminated.

```
db.example.update(
    { "DocName" : "Document 1" },
    { $set: { "Date": { "Month": 4, "Day": 18 } } }
)
```

Results from this operation look something like the following.

```
{
    "DocName": "Document 1",
    "Date": {
        "Month": 4,
        "Day": 18
    }
}
```

### Inserting new fields into an embedded document
<a name="document-database-updating-adding-field-embedded"></a>

**To add fields to an embedded document**  
To modify a document by adding one or more new fields to an embedded document, use the `update()` operation with documents that identify the embedded document and "dot notation" to specify the embedded document and the new fields and values to insert using the `$set` operator.

Given the following document, the following code uses "dot notation" to insert the `Year` and `DoW` fields to the embedded `Date` document, and `Words` into the parent document.

```
{
    "DocName": "Document 1",
    "Date": {
        "Month": 4,
        "Day": 18
    }
}
```

```
db.example.update(
    { "DocName" : "Document 1" },
    { $set: { "Date.Year": 1987, 
              "Date.DoW": "Saturday",
              "Words": 2482 } }
)
```

Results from this operation look something like the following.

```
{
    "DocName": "Document 1",
    "Date": {
        "Month": 4,
        "Day": 18,
        "Year": 1987,
        "DoW": "Saturday"
    },
    "Words": 2482
}
```

### Removing a field from a document
<a name="document-database-remove-field"></a>

To modify a document by removing a field from the document, use the `update()` operation with a query document that identifies the document to remove the field from, and the `$unset` operator to specify the field to remove.

The following example removes the `Words` field from the preceding document.

```
db.example.update(
    { "DocName" : "Document 1" },
    { $unset: { Words:1 } }
)
```

Results from this operation look something like the following.

```
{
    "DocName": "Document 1",
    "Date": {
        "Month": 4,
        "Day": 18,
        "Year": 1987,
        "DoW": "Saturday"
    }
}
```

### Removing a field from multiple documents
<a name="document-database-remove-field-all"></a>

To modify a document by removing a field from multiple documents, use the `update()` operation with the `$unset` operator and the `multi` option set to `true`.

The following example removes the `Inventory` field from all documents in the example collection. If a document does not have the `Inventory` field, no action is taken on that document. If `multi: true` is omitted, the action is performed only on the first document that meets the criterion.

```
db.example.update(
    {},
    { $unset: { Inventory:1 } },
    { multi: true }
)
```

## Deleting documents
<a name="document-database-deleting"></a>

To remove a document from your database, use the `remove()` operation, specifying which document to remove. The following code removes "Gel Pen" from your `example` collection.

```
db.example.remove( { "Item": "Gel Pen" } )
```

To remove all documents from your database, use the `remove()` operation with an empty query.

```
db.example.remove( { } )
```