# Best practices for Amazon Route 53
<a name="best-practices"></a>

This section provides best practices for various components of Amazon Route 53, including:

1. **DNS best practices:**
   + Understand the trade-offs between time to live (TTL) values and responsiveness versus reliability.
   + Use alias records instead of CNAME records when possible for improved performance and cost savings.
   + Configure default routing policies to ensure all clients receive a response.
   + Leverage latency-based routing for minimizing application latency and geolocation/geoproximity routing for stability and predictability.
   + Verify change propagation using the `GetChange` API for automated workflows.
   + Delegate subdomains from the parent zone for consistent routing. 
   + Avoid large single responses by using multivalue answer routing.

1. **Resolver best practices:**
   + Prevent routing loops by avoiding associating the same VPC with both a Resolver rule and its inbound endpoint.
   + Implement security group rules to reduce connection tracking overhead and maximize query throughput.
   + Configure inbound endpoints with IP addresses in multiple Availability Zones for redundancy.
   + Be aware of potential DNS zone walking attacks and contact AWS Support if your endpoints experience throttling.

1. **Health checks best practices:**
   + Follow recommendations for optimizing Amazon Route 53 health checks to ensure reliable monitoring of your resources

 By adhering to these best practices, you can optimize the performance, reliability, and security of your DNS infrastructure, ensuring efficient and effective routing of traffic to your applications and services

**Topics**
+ [Best practices for Amazon Route 53 DNS](best-practices-dns.md)
+ [Best practices for VPC Resolver](best-practices-resolver.md)
+ [Best practices for Amazon Route 53 health checks](best-practices-healthchecks.md)

# Best practices for Amazon Route 53 DNS
<a name="best-practices-dns"></a>

Follow these best practices to get the best results when using the Amazon Route 53 DNS service.

**Use data plane functions for DNS failover and app recovery**  
The data planes for Route 53, including health checks, and Amazon Application Recovery Controller (ARC) routing control are globally distributed, and are designed for 100% availability and functionality, even during severe events. They integrate with each other and don't depend on control plane functionality. While the control planes for these services, including their consoles, are generally very reliable, they're designed in a more centralized way and prioritize durability and consistency rather than high availability. For scenarios such as failover during disaster recovery, we recommend that you use features like Route 53 health checks and ARC routing control that rely on data plane functionality to update DNS. For more information, see [Control and data plane concepts](route-53-concepts.md#route-53-concepts-control-and-data-plane) and [Blog: Creating Disaster Recovery Mechanisms Using Amazon Route 53](https://aws.amazon.com/blogs/networking-and-content-delivery/creating-disaster-recovery-mechanisms-using-amazon-route-53/).

**Choosing TTL values for DNS records**  
The DNS TTL is the numeric value (in seconds) that DNS resolvers use to decide how long a record can be cached for without making another query to Route 53. All DNS records must have a TTL specified for them. The recommended range for TTL values is 60 to 172,800 seconds.  
The choice of a TTL is a trade-off between latency and reliability, and responsiveness to change. With shorter TTLs on a record, DNS resolvers notice updates to the record quicker as they must query more frequently. This increases the query volume (and cost). As you lengthen the TTL, DNS resolvers answer queries from cache more often, which is typically faster, cheaper, and in some situations, more reliable, because it avoids queries across the internet. There is no correct value, but it is worthwhile to think about whether responsiveness or reliability is more important to you.  
Things to consider when you set TTL values include:  
+ Set DNS record TTLs for the length of time that you can afford to wait for a change to take effect. This is especially true on delegations (NS record sets), or other records that rarely change, for example MX records. For these records, longer TTLs are recommended. A value between an hour (3600s) and a day (86,400s) is a common choice.
+ For records that need to be altered as part of a rapid failover mechanism (especially records that are health checked), lower TTLs are appropriate. Setting a TTL of 60 or 120 seconds is a common choice for this scenario.
+ When you want to make changes to critical DNS entries, we recommend that you temporarily shorten the TTLs. Then you can make the changes, observe, and rollback quickly if you need to. After the changes are finalized and working as expected, you can increase the TTL.
For more information see [TTL (seconds)](resource-record-sets-values-shared.md#rrsets-values-common-ttl).

**CNAME records**  
   
 DNS CNAME records are a way to point one domain name to another. If a DNS resolver resolves domain-1.example.com and finds a CNAME pointing at domain-2.example.com, the DNS resolver must proceed to resolve domain-2.example.com before it can respond. These records are useful in many situations, for example, to ensure consistency when a website has more than one domain name.   
However, DNS resolvers must make more queries to answer CNAMEs, which increases latency and costs. Where possible, a faster and cheaper alternative is to use a Route 53 alias record. Alias records allow Route 53 to respond with a direct answer for AWS resources (for example, a load balancer) and for other domains within the same hosted zone.  
For more information, see [Routing internet traffic to your AWS resources](routing-to-aws-resources.md).

**Advanced DNS routing**  
+ When using geolocation, geoproximity, or latency-based routing, always set a default, unless you want some clients to receive *no answer* responses.
+ To minimize application latency, use latency-based routing. This type of routing data can change frequently.
+ To provide routing stability and predictability, use either geolocation or geoproximity routing.
For more information, see [Geolocation routing](routing-policy-geo.md), [Geoproximity routing](routing-policy-geoproximity.md), and [Latency-based routing](routing-policy-latency.md).

**DNS change propagation**  
When you create or update a record or hosted zone by using the Route 53 console or API, it takes some time for the change to be reflected across the internet. This is called *change propagation*. While propagation typically takes less than one minute globally, there are occasionally delays, for example, due to problems syncing to one location, or in rare cases, problems within the central control plane. If you are building automated provisioning work flows, and it is important to wait for change propagation to complete before you move forward with the next work flow step, use the [GetChange](https://docs.aws.amazon.com/Route53/latest/APIReference/API_GetChange.html) API to verify that your DNS changes have gone into effect (`Status =INSYNC`) .

**DNS delegation**  
When you delegate multiple levels of subdomains in DNS, it is important to always delegate from the parent zone. For example, if you are delegating www.dept.example.com, you should do so from the dept.example.com zone, not from the example.com zone. Delegations from a *grandparent* to a *child* zone might not work at all or work only inconsistently. For more information, see [Routing traffic for subdomains](dns-routing-traffic-for-subdomains.md).

**Size of DNS response**  
Avoid creating large single responses. If responses are larger than 512 bytes, many DNS resolvers must retry over TCP instead of UDP, which can reduce reliability and lead to slower responses. We recommend using multivalue answer routing, which chooses eight healthy random IP addresses to keep responses within the 512 byte boundary.  
For more information, see [Multivalue answer routing](routing-policy-multivalue.md) and [DNS Reply Size Test Server](https://www.dns-oarc.net/oarc/services/replysizetest/).

# Best practices for VPC Resolver
<a name="best-practices-resolver"></a>

This section provides best practices for optimizing Amazon Route 53 VPC Resolver, covering the following topics:

1. **Avoiding Loop Configurations with Resolver Endpoints:**
   + Prevent routing loops by ensuring that the same VPC is not associated with both a Resolver rule and its inbound endpoint.
   + Utilize AWS RAM to share VPCs across accounts while maintaining proper routing configurations.

   For more information, see [Avoid loop configurations with Resolver endpoints](best-practices-resolver-endpoints.md)

1. **Scaling Resolver endpoints:**
   + Implement security group rules that permit traffic based on connection state to reduce connection tracking overhead
   + Follow recommended security group rules for inbound and outbound Resolver endpoints to maximize query throughput.
   + Monitor unique IP address and port combinations generating DNS traffic to avoid capacity limitations. 

   For more information, see [Resolver endpoint scaling](best-practices-resolver-endpoint-scaling.md)

1. **High availability for Resolver endpoints:**
   + Create inbound endpoints with IP addresses in at least two Availability Zones for redundancy.
   + Provision additional network interfaces to ensure availability during maintenance or traffic surges

   For more information, see [High availability for Resolver endpoints](best-practices-resolver-endpoint-high-availability.md)

1. **Preventing DNS zone walking attacks:**
   + Be aware of potential DNS zone walking attacks, where attackers attempt to retrieve all content from DNSSEC-signed DNS zones.
   + If your endpoints experience throttling due to suspected zone walking, contact AWS Support for assistance. 

   For more information, see [DNS zone walking](best-practices-resolver-zone-walking.md)

 By following these best practices, you can optimize the performance, scalability, and security of your VPC Resolver deployments, ensuring reliable and efficient DNS resolution for your applications and resources.

# Avoid loop configurations with Resolver endpoints
<a name="best-practices-resolver-endpoints"></a>

Don't associate the same VPC to a Resolver rule and its inbound endpoint (whether it’s a direct target of the endpoint, or via an on-premises DNS server). When the outbound endpoint in a Resolver rule points to an inbound endpoint that shares a VPC with the rule, it can cause a loop where the query is continually passed between the inbound and outbound endpoints.

The forwarding rule can still be associated with other VPCs that are shared with other accounts by using AWS Resource Access Manager (AWS RAM). Private hosted zones associated with the hub, or a central VPC, will still resolve from queries to inbound endpoints because a forwarding resolver rule does not change this resolution.

# Resolver endpoint scaling
<a name="best-practices-resolver-endpoint-scaling"></a>

Resolver endpoint security groups use connection tracking to gather information about traffic to and from the endpoints. Each endpoint interface has a maximum number of connections that can be tracked, and a high volume of DNS queries can exceed the connections and cause throttling and query loss. Connection tracking is AWS's default behavior for monitoring the state of traffic flowing through security groups (SGs). Using connection tracking in SGs will reduce the throughput of traffic, however, you can implement untracked connections to reduce overhead and improve performance. For more information see [Untracked connections](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/security-group-connection-tracking.html#untracked-connections).

If the connection tracking is enforced either by using restrictive security group rules or queries are routed through Network Load Balancer (see [Automatically tracked connections](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/security-group-connection-tracking.html#automatic-tracking)), the overall maximum queries per second per IP address for an endpoint can be as low as 1500.

**Ingress and egress Security Group rule recommendations for the inbound Resolver endpoint**


****  

| 
| 
| **Ingress rules** | 
| --- |
| Protocol type | Port number | Source IP | 
| TCP  | 53 | 0.0.0.0/0 | 
| UDP | 53 | 0.0.0.0/0 | 
| **Egress rules** | 
| --- |
| Protocol type | Port number | Destination IP | 
| TCP | All | 0.0.0.0/0 | 
| UDP | All | 0.0.0.0/0 | 

**Ingress and egress security group rule recommendations for the outbound Resolver endpoint**


****  

| 
| 
| **Ingress rules** | 
| --- |
| Protocol type | Port number | Source IP | 
| TCP  | All | 0.0.0.0/0 | 
| UDP | All | 0.0.0.0/0 | 
| **Egress rules** | 
| --- |
| Protocol type | Port number | Destination IP | 
| TCP | 53 | 0.0.0.0/0 | 
| UDP | 53 | 0.0.0.0/0 | 

**Note**  
**Security group port requirements:**  
**Inbound endpoints** require ingress rules allowing TCP and UDP on port 53 to receive DNS queries from your network. Egress rules can allow all ports since the endpoint may need to respond to queries from various source ports.
**Outbound endpoints** require egress rules allowing TCP and UDP access to the ports you're using for DNS queries on your network. Port 53 is shown in the example above because it's the most common DNS port, but your network might use different ports. Ingress rules can allow all ports to accommodate responses from your DNS servers.

**Inbound Resolver endpoints**

For clients using an inbound resolver endpoint, the capacity of the elastic network interface will be impacted if you have over 40,000 unique IP address and port combinations generating the DNS traffic.

# High availability for Resolver endpoints
<a name="best-practices-resolver-endpoint-high-availability"></a>

When you create your VPC Resolver inbound endpoints, Route 53 requires that you create at least two IP addresses that the DNS resolvers on your network will forward queries to. You should also specify IP addresses in at least two Availability Zones for redundancy. 

If you require more than one elastic network interface endpoint to be available at all times, we recommend that you create at least one more network interface than you need, to make sure you have additional capacity available for handling possible traffic surges. The additional network interface also ensures availability during service operations like maintenance or upgrades.

For more information, see this detailed blog article: [How to achieve DNS high availability with Resolver endpoints](https://aws.amazon.com/blogs/networking-and-content-delivery/how-to-achieve-dns-high-availability-with-route-53-resolver-endpoints/) and [Values that you specify when you create or edit inbound endpoints](resolver-forwarding-inbound-queries-values.md).

# DNS zone walking
<a name="best-practices-resolver-zone-walking"></a>

A DNS zone walking attack attempts to get all content from DNSSEC-signed DNS zones. If VPC Resolver team detects a traffic pattern that matches the ones generated when DNS zones are walked on your endpoint, the service team will throttle the traffic on your endpoint. As a consequence you might observe a high percentage of your DNS queries timing out.

If you observe reduced capacity on your endpoints and believe that the endpoint have been throttled erroneously, go to https://console.aws.amazon.com/support/home\$1/ to create a support case. 

# Best practices for Amazon Route 53 health checks
<a name="best-practices-healthchecks"></a>

Effective health check configuration is essential for maintaining a highly available and resilient infrastructure. Here are some best practices to consider when setting up and managing Amazon Route 53 health checks: 

1.  **Use elastic IP addresses for health check endpoints:**
   + Utilize elastic IP addresses for your health check endpoints to ensure consistent monitoring. 
   + If you are no longer using an Amazon EC2 instance, remember to delete any associated health checks to avoid potential security risks or data compromise.

   Fore more information, see [Values that you specify when you create or update health checks](health-checks-creating-values.md).

1. **Configure appropriate health check intervals:**
   + Set health check intervals based on your application's requirements and the criticality of the monitored resources.
   +  Shorter intervals provide faster failure detection but may increase Route 53 costs and load on your resources.
   + Longer intervals reduce costs and resource load but may delay failure detection.

   Fore more information, see [Advanced configuration ("Monitor an Endpoint" only)](health-checks-creating-values.md#health-checks-creating-values-advanced).

1. **Implement alarm notifications:**
   + Configure Amazon CloudWatchalarms to receive notifications when health checks fail or recover.
   + Set appropriate alarm thresholds based on your application's requirements and the expected behavior of your resources.
   + Integrate notifications with your monitoring and incident response processes.

   Fore more information, see [Monitoring health checks using CloudWatch](monitoring-health-checks.md).

1. **Utilize health check Regions strategically:**
   + Choose health check Regions based on the geographic distribution of your users and resources.
   +  Consider using multiple health check regions for critical resources to improve reliability and reduce the impact of regional outages. 

1. **Monitor health check logs and metrics:** 
   + Regularly review Route 53 health check logs and CloudWatch metrics to identify potential issues or performance bottlenecks
   + Analyze health check failure reasons and take appropriate actions to resolve underlying problems.

1. **Implement Failover and Failback Strategies:**
   + Leverage Route 53's failover routing policies to automatically route traffic to healthy resources in the event of failures. 
   + Plan and test failover and failback processes to ensure seamless transition during outages and recovery.

   Fore more information, see [Configuring DNS failover](dns-failover-configuring.md).

1. **Regularly Review and Update Health Checks:**
   + Update health check endpoints, intervals, and alarm thresholds as needed to maintain optimal monitoring and performance. 

By following these best practices, you can effectively leverage Amazon Route 53 health checks to monitor the health and availability of y our resources, ensuring a reliable and high-performing infrastructure for your applications and services.