Adopting a consistent design decision approach - Cloud-Driven Enterprise Transformation on AWS

Adopting a consistent design decision approach

An important attribute of AWS is the flexibility to permit an evolving design, and the elasticity of components as business needs evolve or traffic demands vary from historical data or expected forecasts.

Architecture on AWS does not need to be perfect. An important mental model shift is required from the legacy on-premises world, where workloads need to be forecast for three to five years plus a safety buffer for hardware sizing, and the architecture must be “future-proof” to support unforeseen changes and requirements. On AWS this is no longer the case.

V1 architecture and hardware performance must meet your current requirements with a safety factor. Teams can architect auto-scaling capability into V1 to burst capacity when required, but even where auto-scaling is not supported by the current application architecture, server capacity can be scaled vertically, on the fly, with minimal to no downtime to meet unexpected traffic demands. (See the V1 AWS design section of this document.)

When in doubt, teams can over-size hardware, allow systems to run for a couple of months on AWS, then down-size or optimize the configuration and service costs after real-world performance on AWS is determined with data. For all of these reasons, teams should invest time in a V1 AWS design that provides the right level of incremental improvements based on business need.

Table 4 – Traditional vs. modern views of IT architecture decisions

Traditional view Modern view

Architecture as one-way door

  • Design for 5-year growth forecast (‘one and done’)

  • Over-build to anticipate/guess future use patterns

  • Led by infrastructure: business not involved

  • Tool and technology focus

Modernize what matters—'V1 AWS design’ architecture approach

  • Architect for cost, performance and operations

  • Two-pizza teams” empowered to make decisions

  • Based on business need, quickly determine first step

  • Optimization backlog on V1 manifests design evolution over time

V1 AWS design

Design decisions regarding the degree of transformation in application and infrastructure architecture, configuration, and the degree of incremental deployment automation manifest themselves in the “V1 AWS design” implemented with the production cutover to AWS. V1 AWS design is the first architecture design on AWS that represents the first, best step to the cloud. V1 AWS design can range from “like for like” to full modernization or cloud native re-factoring, depending on business value and alternate uses of time and resources.

Cloud architecture decisions are best driven with the application owners, developers, and business partners. A good V1 AWS design document paints a clear path forward for V1 deployment. This V1 design includes trade-offs to determine what components of operating system (OS), database, application code, resiliency architecture, deployment automation, and services architecture enhancements should be implemented with the migration to AWS (and incorporated in V1 design) or be deferred to the backlog for post-migration to AWS optimization.

A move to the cloud permits a departure from the old approach of designing once and maintaining for years. Instead, there is a shift to a frequent redesign concept of continuous improvement and ongoing cost-aware design optimization. There is sometimes a sense of a false trade-off between “lift and shift,” where teams replicate the current on-premises architecture in the cloud, and a full “cloud native” modernization of the application (such as a move to microservices). In some cases, it may be worth the time and effort to fully refactor an application to meet specific business needs, including scalability, performance, resiliency, or other objectives. There is a full spectrum of V1 design options available to teams between “lift and shift” and full cloud native modernization. Some examples of design improvements short of full cloud-native architecture include:

  • Upgrade End of Life (EOL) or near EOL versions of O/S, middleware, and application components that expose security risks

  • Move to managed services such as load balancer, database, or caching service

  • Improve resiliency with multi-AZ architecture and self-healing components

  • Automate full-stack builds and deployments to support faster time to market

  • Expanded test automation and coverage

  • Add auto-scaling (for horizontal scaling or host swaps) and parallel processing for high-frequency workloads

  • Use specialized instances for I/O, storage, or memory intensive workloads

  • Support for blue / green deployments

  • Add automated backup and archival functions, or automate disaster recovery / failover

  • Supplement core operations tasks using AWS Lambda and AWS Step Functions

  • Expand use of APIs

  • Enhance data security and protection (such as Protected Health Information)

These should all be design considerations as you deploy on AWS. Critical changes should be implemented before moving to AWS. Less critical changes can be included in the optimization backlog to address after migration to AWS is complete.

The question is not “what is the best system architecture?” but rather “what is the best first step for AWS V1 design based on business objectives?” The time and effort to achieve these V1 improvements should be evaluated on alignment with business needs, available skills, and resources, and compared with other opportunities to invest in V1 improvements. Any backlog items can be included in future sprint objectives using a continuous improvement mindset.

Speed over perfection

The most critical aspect of transformation acceleration is to have an efficient decision-making process in place to set V1 design. Program acceleration objectives, cost / benefit trade-offs, and new capability delivery requirements will inform the degree of incremental change necessary for an appropriate V1 configuration.

It is essential to involve all relevant stakeholders and make the V1 AWS design decision quickly, then implement the agreed-upon plan and defer remaining optimization opportunities to the post-migration AWS backlog. The sooner application development teams are involved in designing, building, and operating on AWS, the more rapidly the promise and potential of operating on the AWS Cloud can be achieved. Think speed to market, experimentation, failing fast, delighting customers, reacting to market conditions, continuous improvement, and so on.

Business and application teams which can allocate 1-2 monthly release windows (depending on the complexity of the systems) in the short term to design, build, and deploy on AWS. They can experiment, innovate, and move faster after their business application portfolios are migrated to AWS. An investment in a “deep dive” on design with broad stakeholders will go a long way toward accelerating later build, test, and deploy steps in the migration process.

There are scenarios where there is a business case to complete application transformation prior to going live on AWS. Examples of this include assets that are beyond EOL and require an upgrade, refactoring of code to global pipeline, creating an API layer, or application code changes required to support multi-AZ deployment for tier 1 business-critical applications.

In any scenario where significant work is required beyond a “like for like” migration, this should be scheduled, budgeted, and planned, along with a cutover timeline, after the work is complete. If there are no resources available to complete the required work within the first two years of the program, teams should consider securing an exception if required, moving to AWS first, then implementing the required upgrades on AWS.

Is there value and capacity to transform? Do that. Are there competing priorities and limited capacity? Pick the simple path and move. Do not oscillate on what will be your first step to the cloud. Pick a path and take it. If it fails, implement your failback procedures, adjust, and try again. If you fail, you learn. You have made progress, rather than analyzing all your options and producing no incremental value.

Cutover to AWS is a “two-way door”. You have not made a hardware purchase decision that you’ll have to live with for years. This is very liberating, and unlocks team creativity when teams understand that they don’t have to know they got every decision just right.

The performance will tell you right away whether you have succeeded in testing, and again in application performance when you go live. You’ll learn more in production cutover. It’s very easy to adjust either on AWS (upsize on the fly) or fail back and try again. What teams should not do is belabor or second-guess V1 AWS design decisions.

Move thoughtfully but quickly through this process. Launch the build and test phases. Make adjustments if needed based on test cases, then deploy and tune, adjust, and optimize after deployment on AWS. In the rare case it does not work at all as expected, run your well-documented failback procedures, adjust your design, run code to deploy a new configuration, and try it again.

Pattern-based design approach

Based on customer engagements of the authors with AWS Professional Services, customers find that over 80% of their workloads can fit into a small number (~10-15) of re-useable architecture patterns to support application types. After these are designed and approved, let teams run with them. Teams can tailor and customize incrementally as needed, but subject matter experts (SMEs) and lead cloud architects need only to review incremental changes before build teams commence their work.

If build teams select a base template that does not require modification, they can begin build steps without further review. Review only necessary changes to the approved patterns, which is a light-weight review process, and empower teams to move fast with management by exception. Don’t build a large variety of patterns when it is not required. Keep it simple wherever you can. Save the effort on re-designing custom workloads (“big rocks”) that require significant change (such as mainframe application refactoring).