Appendix A - Partitional service guidance
For partitional services, you should implement static stability in order to maintain resilience of your workload during an AWS service control plane impairment. The following provides prescriptive guidance on how to consider dependencies on partitional services as well as what will and may not work during a control plane impairment.
AWS Identity and Access Management (IAM)
The AWS Identity and Access Management (IAM) control plane consists of all public IAM APIs (including Access Advisor but
not Access Analyzer or IAM Roles Anywhere). This includes actions like
CreateRole
, AttachRolePolicy
, ChangePassword
,
UpdateSAMLProvider
, and UpdateLoginProfile
. The IAM data plane
provides authentication and authorization for IAM principals in each AWS Region. During a
control plane impairment, CRUDL type operations for IAM may not work, but authentication and
authorization for existing principals will continue to work. STS is a data plane-only service
that is separate from IAM, and does not depend on the IAM control plane.
What this means is that when you are planning for dependencies on IAM, you should not rely on the IAM control plane in your recovery path. For example, a statically-stable design for a “break-glass” admin user would be to create a user with the appropriate permissions attached, have the password set and the access key and secret access key provisioned, and then lock those credentials in a physical or virtual vault. When required during an emergency, retrieve the user credentials from the vault and use them as needed. A non-statically-stable design would be to provision the user during a failure, or having the user pre-provisioned, but only attaching the admin policy when required. These approaches would depend on the IAM control plane.
AWS Organizations
The AWS Organizations control plane consists of all public
Organizations APIs like AcceptHandshake
, AttachPolicy
,
CreateAccount
, CreatePolicy
, and ListAccounts
. There is not a data
plane for AWS Organizations. It orchestrates the data plane for
other services like IAM. During a control plane impairment, CRUDL
type operations for Organizations may not work, but the policies,
like Service Control Policies (SCP) and Tag Policies, will
continue to work and be evaluated as part of the IAM authorization
process. Delegated admin capabilities and multi-account features
in other AWS services that are supported by Organizations will
also continue to work.
What this means is that when you are planning for dependencies on
AWS Organizations, you should not rely on the Organizations
control plane in your recovery path. Instead, implement static
stability in your recovery plan. For example, a
non-statically-stable approach might be to update SCPs to remove
restrictions on allowed AWS Regions via the aws:RequestedRegion
condition, or to enable admin permissions for specific IAM roles.
This relies on the Organizations control plane to make these
updates. A better approach would be to use
session
tags to grant the use of admin permissions. Your Identity
Provider (IdP) can include session tags that can be evaluated
against the aws:PrincipalTag
condition, which helps you to
dynamically configure permissions for certain principals while
helping your SCPs to remain static. This removes dependencies on
control planes and only utilizes data plane actions.
AWS Account Management
The AWS Account Management control plane is hosted in us-east-1
and consists of all
public
APIs for managing an AWS account, such as
GetContactInformation
and PutContactInformation
. It also includes
creating or closing a new AWS account through the management
console. The APIs for CloseAccount
, CreateAccount
,
CreateGovCloudAccount
, and DescribeAccount
are part of the AWS Organizations control plane, which is also hosted in us-east-1.
Additionally,
creating
a GovCloud account outside of AWS Organizations relies on
the AWS account management control plane in us-east-1. Also,
GovCloud accounts
must
be 1:1 linked to an AWS account in the aws
partition.
Creating accounts in the aws-cn
partition does not rely on
us-east-1. The data plane for AWS accounts is the accounts
themselves. During a control plane impairment, CRUDL-type
operations (like creating a new account or getting and updating
contact information) for AWS accounts may not work. References to
the account in IAM policies will continue to work.
What this means is that when you are planning for dependencies on AWS Account Management, you should not rely on the Account Management control plane in your recovery path. Although the Account Management control plane doesn’t provide direct functionality that you would typically use in a recovery situation, there may be times when you would. For example, a statically-stable design would be to pre-provision all of the AWS accounts you need for failover. A non-statically-stable design would be to create new AWS accounts during a failure event to host your DR resources.
Route 53 Application Recovery Controller
The control plane for Route 53 ARC consists of the APIs for
recovery control and recovery readiness, as identified at:
Amazon Route 53 Application Recovery Controller endpoints and
quotas. You manage readiness checks, routing controls, and
cluster operations by using the control plane. The data plane of
ARC is your recovery cluster, which manages the routing control
values that are queried by Route 53 health checks, and also
implements the safety rules. The
data
plane functionality of Route 53 ARC is accessed through
your recovery cluster APIs like
https://aaaaaaaa.route53-recovery-cluster.eu-west-1.amazonaws.com
.
What this means is that you shouldn’t rely on the Route 53 ARC control plane in your recovery path. There are two best practices that help implement this guidance:
-
First, bookmark or hard code the five Regional cluster endpoints. This removes the need to use the DescribeCluster control plane operation during a failover scenario to discover the endpoint values.
-
Second, use the Route 53 ARC cluster APIs by using the CLI or SDK to perform updates to routing controls and not the AWS Management Console. This removes the management console as a dependency for your failover plan and ensures it depends on only data plane actions.
AWS Network Manager
The AWS Network Manager service is primarily a control plane-only
system hosted in us-west-2. Its purpose is to centrally manage the
configuration of your AWS Cloud wide area networking (WAN) core
network and your AWS Transit Gateway network across AWS accounts,
Regions, and on-premises locations. It also aggregates your Cloud
WAN metrics in us-west-2, which can also be accessed through the
CloudWatch data plane. If Network Manager is impaired, the data
plane of the services it orchestrates will not be impacted. The
CloudWatch metrics for Cloud WAN are also available in us-west-2.
If you want historical metric data, like bytes in and out per
Region, to understand how much traffic might shift to other
Regions during a failure impacting us-west-2, or for other
operational purposes, you can export those metrics as CSV data
directly from the CloudWatch console or using this method:
Publish
Amazon CloudWatch metrics to a CSV file. The data can be
found under the AWS/Network Manager
namespace and you can perform
this on a schedule you choose and store it in S3 or in another
data store you select. To implement a statically-stable recovery
plan, do not use AWS Network Manager to make updates to your
network, or rely on data from its control plane operations for
failover input.
Route 53 Private DNS
Route 53 private hosted zones are supported in each partition; however, the considerations for private hosted zones and public hosted zones in Route 53 are the same. Refer to Amazon Route 53 in Appendix B - Edge network global service guidance.