Multi-Region fundamental 3: Understanding your workload dependencies
A specific workload might have several dependencies in a Region, such as AWS services used, internal dependencies, third-party dependencies, network dependencies, certificates, keys, secrets, and parameters. To ensure operation of the workload during a failure scenario, there should be no dependencies between the primary Region and the standby Region; each should be able to operate independently of one another. To achieve this, all dependencies in the workload must be scrutinized to ensure they are available within each Region. This is required because a failure in the primary Region should not have an impact in the standby Region. In addition, knowledge of how the workload operates when a dependency is in a degraded state or completely unavailable is imperative, so that solutions can be engineered to handle this appropriately.
3a: AWS services
When designing a multi-Region architecture, an understanding of
the specific AWS services that will be used is necessary. The
first aspect is understanding what features the service has to
enable multi-Region, and if a solution must be engineered to
accomplish the multi-Region goals. For example, with Amazon Aurora
and Amazon DynamoDB, there is a feature to asynchronously
replicate data to a standby Region. Any AWS service dependencies
will need to be available in all Regions that a workload is going
to run from. To ensure the services that will be used are
available in the desired Regions, review the
AWS Regional Services List
3b: Internal and third-party dependencies
For any internal dependencies that a workload has, ensure it’s available from the Regions the workload will operate out of. For example, if the workload is composed of many microservices, be knowledgeable about all of the microservices that comprise a business capability. From there, ensure that all of those microservices are deployed in each Region the workload will operate out of.
Cross-Region calls between microservices within a workload is not advised, and Regional isolation should be maintained. This is because creating cross-Region dependencies adds risk of correlated failure, which negates the benefits you are trying to achieve with isolated Regional implementations of the workload. On-premises dependencies might be part of the workload as well, so understanding how characteristics of these integrations could change if the primary Region was to change is imperative. For example, if the standby Region is located farther from the on-premises environment, the increased latency will have a negative impact.
Understanding Software as a Service (SaaS) solutions, software development kits (SDKs),
and other third-party product dependencies, and being able to exercise scenarios where these
dependencies are either degraded or unavailable will provide more insight into how the chain
of systems operates and behaves under different failure modes. These dependencies could be
within an application code fromhow secrets are managed externally using AWS Secrets Manager
Having redundancy when it comes to dependencies can aid in increased resilience. There is also the possibility that a SaaS solution or third-party dependency is using the same primary AWS Region as the workload. If this is the case, you should work with the vendor to determine if their resilience posture matches requirements for the workload.
Additionally, be aware of shared fate between the workload and its dependencies, such as third-party applications. If the dependencies are not available in (or from) a secondary Region after a failover, the workload might not recover fully.
3c: Failover mechanism
The Domain Name System (DNS) is commonly used as a failover mechanism to shift traffic
away from the primary Region to a standby Region. Critically review and scrutinize all
dependencies the failover mechanism takes. For example, if your workload is using Amazon Route 53
As discussed in the internal dependency section, all microservices that are part of a business capability need to be available in each Region in which the workload is deployed. As part of the failover strategy, the business capability needs to failover together to remove the chance of cross-Region calls. Alternatively, if microservices failover independently, this introduces the potential for undesirable behavior where microservices potentially make cross-Region calls, which introduces latency and could lead to the workload being unavailable in the event of client timeouts.
3d: Configuration dependencies
Certificates, keys, secrets, and parameters are part of the dependency analysis needed when designing for multi-Region. Whenever possible, it’s best to localize these components within each Region so they do not have shared fate between Regions for these dependencies. For certificates, expiration should vary among them, and if possible, in each Region, to avoid a scenario when an expiring certificate (with alarms set to notify in advance) impacts multiple Regions.
Encryption keys and secrets should be Region-specific as well. That way, if there is an error in rotation of a key or secret, the impact is limited to a specific Region.
Lastly, any workload parameters should be stored locally for the workload to retrieve in the specific Region.
Key guidance
-
A multi-Region architecture benefits from physical and logical separation between Regions. Introducing cross-Region dependencies at the application layer breaks this benefit. Avoid such dependencies.
-
Failover controls should work with no dependencies on the primary Region.
-
Coordinating failover at the business capability needs to be done to remove the possibility of increased latency and dependency of cross-Region calls.