# Guidance for Generation Interconnection Simulation on AWS

## Overview

This Guidance shows how to use AWS services to host generation interconnection simulations, such as production cost modeling, on AWS. Due to the variability and unpredictability of renewable energy sources, integrating them into the grid requires considerable analysis. While many simulation tools aid in grid planning, they often run on local servers, limiting their performance for increasingly complex simulations. By hosting simulations on the scalable and reliable AWS infrastructure, you can reduce complex-simulation run time, avoid interruptions and restarts, and meet dynamic demand to accelerate your renewable energy transition.

## How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

[Download the architecture diagram](https://d1.awsstatic.com/solutions/guidance/architecture-diagrams/generation-interconnection-simulation-on-aws.pdf)

![Architecture diagram](/images/solutions/generation-interconnection-simulation-on-aws/images/generation-interconnection-simulation-on-aws-1.png)

1. **Step 1**: Use AWS Amplify to build a simple, full stack web application and get authenticated with Amazon Cognito. Upload and download data stored in Amazon Simple Storage Service (Amazon S3). Invoke an AWS Lambda function to preprocess input data for generation interconnection simulation, and start AWS Step Functions.
1. **Step 2**: Use Step Functions to create a workflow that submits the simulation job, automates job batching, and monitors job status.
1. **Step 3**: Configure an AWS ParallelCluster with the required software and dependencies to run generation interconnection simulations with job schedulers. Admins can interact with the high performance computing (HPC) cluster using the pcluster command line interface (CLI) and the ParallelCluster UI (from ParallelCluster version 3.5.0). NICE DCV is also included in ParallelCluster.
1. **Step 4**: Use a job scheduler with a built-in queue to optimize generation interconnection simulation tasks depending on the job attributes (such as the number of tasks or priority) and the compute environment. AWS Batch and Slurm are natively supported. Alternatives are Terascale Open-Source Resource and Queue Manager (TORQUE) and HTCondor.
1. **Step 5**: Schedulers distribute jobs across multiple nodes of a compute fleet. Amazon EC2 Auto Scaling is configured to scale compute capacity dynamically according to the number of jobs scheduled. Compute-optimized instances are recommended for the compute node (for example, Amazon EC2 C7i Instances or Amazon EC2 C7a Instances).
1. **Step 6**: Use Amazon FSx for NetApp ONTAP or Amazon FSx for OpenZFS as a high performance file system to process and store intermediate results generated by generation interconnection simulation software. Amazon S3 can be used to store the output files.
1. **Step 7**: Use AWS DataSync to move a selected portion of data from Amazon FSx to Amazon S3 for output visualization.
1. **Step 8**: Use EC2 Image Builder and predefined AWS CloudFormation templates to manage the image for the cluster head node and the compute node for continuous integration and continuous delivery (CI/CD).
1. **Step 9**: Use Amazon Simple Notification Service (Amazon SNS) and Amazon CloudWatch to monitor the cluster and notify users of simulation job status changes, such as starting and completion.
## Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

### Operational Excellence

Amplify lets you quickly and securely set up and manage a serverless UI for the HPC cluster, and Step Functions helps you visualize and control the workflow that orchestrates job steps. CloudWatch monitors the cluster’s performance through collected metrics, helping you gain insights into the operation. And by using CloudFormation, you can use infrastructure as code to provision the environment, limiting human errors and increasing the consistency of event responses. All of these services are fully managed by AWS. [Read the Operational Excellence whitepaper](/wellarchitected/latest/operational-excellence-pillar/welcome.html)


### Security

Cognito provides frictionless customer identity and access management for the frontend and enables user pools as well as federated login and access. Federated access lets you use existing identities and permissions, and provide a uniform user experience with the same level of security as used by the rest of your company. By scoping AWS Identity and Access Management (IAM) policies according to the least-privilege principle, you can limit unauthorized access to resources. [Read the Security whitepaper](/wellarchitected/latest/security-pillar/welcome.html)


### Reliability

EC2 Auto Scaling equally distributes Amazon EC2 instances in multiple availability zones (AZs) to increase fault tolerance and availability. It can detect when an instance is unhealthy, terminate it, and launch an instance to replace it. Additionally, if one AZ becomes unavailable, EC2 Auto Scaling can launch instances in another AZ to compensate. Amazon FSx, which supports the HPC application’s high input/output operations per second (IOPS) and large throughput, can also be deployed to multiple AZs, providing enhanced durability by synchronously replicating data across AZs. It also enhances availability during both planned system maintenance and unplanned service disruption by failing over automatically to the standby AZ. This protects data against instance failure and AZ disruption. Finally, Amazon S3 provides persistent and reliable storage for input and output data. [Read the Reliability whitepaper](/wellarchitected/latest/reliability-pillar/welcome.html)


### Performance Efficiency

ParallelCluster spins up and down necessary instances to meet demand dynamically by using an EC2 Auto Scaling group, which makes sure that resources are the right size for the workload. Amazon FSx can process massive data sets with hundreds of gigabytes per second of throughput, millions of IOPS, and submillisecond latencies. [Read the Performance Efficiency whitepaper](/wellarchitected/latest/performance-efficiency-pillar/welcome.html)


### Cost Optimization

Step Functions and Lambda help minimize costs through their event-driven pattern: no costs are incurred when no jobs are submitted. Additionally, ParallelCluster uses an EC2 Auto Scaling group to spin out only the instances needed, avoiding resource idling and waste. ParallelCluster uses an EC2 Auto Scaling launch template to start instances for submitted jobs, and you can choose the most cost-effective instance type based on your performance benchmarking and resource utilization rate. CloudWatch monitors usage and delivers logs and insights that can help you right-size your fleet instances and operate cost-aware workloads. [Read the Cost Optimization whitepaper](/wellarchitected/latest/cost-optimization-pillar/welcome.html)


### Sustainability

Amazon S3 Intelligent-Tiering monitors access patterns and moves objects among tiers automatically, thus striking a balance between cost and energy reduction and access efficiency. EC2 Auto Scaling helps you dynamically scale the compute fleet of the HPC cluster to avoid resource idling, resulting in a more efficient and sustainable solution. Additionally, Step Functions and Lambda only operate in response to job submissions and don’t run during the HPC cluster’s idle time, thereby reducing the required resources and decreasing the environmental impact of your workloads. [Read the Sustainability whitepaper](/wellarchitected/latest/sustainability-pillar/sustainability-pillar.html)


[Read usage guidelines](/solutions/guidance-disclaimers/)

