

本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# 使用 Amazon EC2 Auto Scaling 群組建立機群基礎設施
<a name="create-auto-scaling"></a>

本節說明如何建立 Amazon EC2 Auto Scaling 機群。

使用以下 CloudFormation YAML 範本建立 Amazon EC2 Auto Scaling (Auto Scaling) 群組、具有兩個子網路的 Amazon Virtual Private Cloud (Amazon VPC)、執行個體描述檔和執行個體存取角色。這些是在子網路中使用 Auto Scaling 啟動執行個體時需要。

您應該檢閱並更新執行個體類型的清單，以符合您的轉譯需求。

如需 CloudFormation YAML 範本中使用的資源和參數的完整說明，請參閱*AWS CloudFormation 《 使用者指南*》中的[截止日期雲端資源類型參考](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/AWS_Deadline.html)。

**建立 Amazon EC2 Auto Scaling 機群**

1. 使用下列範例來建立定義 `FarmID`、 `FleetID`和 `AMIId` 參數的 CloudFormation 範本。將範本儲存至本機電腦上`.YAML`的檔案。

   ```
   AWSTemplateFormatVersion: 2010-09-09
   Description: Amazon Deadline Cloud customer-managed fleet
   Parameters:
     FarmId:
       Type: String
       Description: Farm ID
     FleetId:
       Type: String
       Description: Fleet ID
     AMIId:
       Type: String
       Description: AMI ID for launching workers
   Resources:
     deadlineVPC:
       Type: 'AWS::EC2::VPC'
       Properties:
         CidrBlock: 100.100.0.0/16
     deadlineWorkerSecurityGroup:
       Type: 'AWS::EC2::SecurityGroup'
       Properties:
         GroupDescription: !Join
           - ' '
           - - Security group created for Deadline Cloud workers in the fleet
             - !Ref FleetId
         GroupName: !Join
           - ''
           - - deadlineWorkerSecurityGroup-
             - !Ref FleetId
         SecurityGroupEgress:
           - CidrIp: 0.0.0.0/0
             IpProtocol: '-1'
         SecurityGroupIngress: []
         VpcId: !Ref deadlineVPC
     deadlineIGW:
       Type: 'AWS::EC2::InternetGateway'
       Properties: {}
     deadlineVPCGatewayAttachment:
       Type: 'AWS::EC2::VPCGatewayAttachment'
       Properties:
         VpcId: !Ref deadlineVPC
         InternetGatewayId: !Ref deadlineIGW
     deadlinePublicRouteTable:
       Type: 'AWS::EC2::RouteTable'
       Properties:
         VpcId: !Ref deadlineVPC
     deadlinePublicRoute:
       Type: 'AWS::EC2::Route'
       Properties:
         RouteTableId: !Ref deadlinePublicRouteTable
         DestinationCidrBlock: 0.0.0.0/0
         GatewayId: !Ref deadlineIGW
       DependsOn:
         - deadlineIGW
         - deadlineVPCGatewayAttachment
     deadlinePublicSubnet0:
       Type: 'AWS::EC2::Subnet'
       Properties:
         VpcId: !Ref deadlineVPC
         CidrBlock: 100.100.16.0/22
         AvailabilityZone: !Join
           - ''
           - - !Ref 'AWS::Region'
             - a
     deadlineSubnetRouteTableAssociation0:
       Type: 'AWS::EC2::SubnetRouteTableAssociation'
       Properties:
         RouteTableId: !Ref deadlinePublicRouteTable
         SubnetId: !Ref deadlinePublicSubnet0
     deadlinePublicSubnet1:
       Type: 'AWS::EC2::Subnet'
       Properties:
         VpcId: !Ref deadlineVPC
         CidrBlock: 100.100.20.0/22
         AvailabilityZone: !Join
           - ''
           - - !Ref 'AWS::Region'
             - c
     deadlineSubnetRouteTableAssociation1:
       Type: 'AWS::EC2::SubnetRouteTableAssociation'
       Properties:
         RouteTableId: !Ref deadlinePublicRouteTable
         SubnetId: !Ref deadlinePublicSubnet1
     deadlineInstanceAccessAccessRole:
       Type: 'AWS::IAM::Role'
       Properties:
         RoleName: !Join
           - '-'
           - - deadline
             - InstanceAccess
             - !Ref FleetId
         AssumeRolePolicyDocument:
           Statement:
             - Effect: Allow
               Principal:
                 Service: ec2.amazonaws.com
               Action:
                 - 'sts:AssumeRole'
         Path: /     
         ManagedPolicyArns:
           - 'arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy'
           - 'arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore'
           - 'arn:aws:iam::aws:policy/AWSDeadlineCloud-WorkerHost'  
     deadlineInstanceProfile:
       Type: 'AWS::IAM::InstanceProfile'
       Properties:
         Path: /
         Roles:
           - !Ref deadlineInstanceAccessAccessRole
     deadlineLaunchTemplate:
       Type: 'AWS::EC2::LaunchTemplate'
       Properties:
         LaunchTemplateName: !Join
           - ''
           - - deadline-LT-
             - !Ref FleetId
         LaunchTemplateData:
           NetworkInterfaces:
             - DeviceIndex: 0
               AssociatePublicIpAddress: true
               Groups:
                 - !Ref deadlineWorkerSecurityGroup
               DeleteOnTermination: true
           ImageId: !Ref AMIId
           InstanceInitiatedShutdownBehavior: terminate
           IamInstanceProfile:
             Arn: !GetAtt
               - deadlineInstanceProfile
               - Arn
           MetadataOptions:
             HttpTokens: required
             HttpEndpoint: enabled
   
     deadlineAutoScalingGroup:
       Type: 'AWS::AutoScaling::AutoScalingGroup'
       Properties:
         AutoScalingGroupName: !Join
           - ''
           - - deadline-ASG-autoscalable-
             - !Ref FleetId
         MinSize: 0
         MaxSize: 10
         VPCZoneIdentifier:
           - !Ref deadlinePublicSubnet0
           - !Ref deadlinePublicSubnet1
         NewInstancesProtectedFromScaleIn: true
         MixedInstancesPolicy:
           InstancesDistribution:
             OnDemandBaseCapacity: 0
             OnDemandPercentageAboveBaseCapacity: 0
             SpotAllocationStrategy: capacity-optimized
             OnDemandAllocationStrategy: lowest-price
           LaunchTemplate:
             LaunchTemplateSpecification:
               LaunchTemplateId: !Ref deadlineLaunchTemplate
               Version: !GetAtt
                 - deadlineLaunchTemplate
                 - LatestVersionNumber
             Overrides:
               - InstanceType: m5.large
               - InstanceType: m5d.large
               - InstanceType: m5a.large
               - InstanceType: m5ad.large
               - InstanceType: m5n.large
               - InstanceType: m5dn.large
               - InstanceType: m4.large
               - InstanceType: m3.large
               - InstanceType: r5.large
               - InstanceType: r5d.large
               - InstanceType: r5a.large
               - InstanceType: r5ad.large
               - InstanceType: r5n.large
               - InstanceType: r5dn.large
               - InstanceType: r4.large
         MetricsCollection:
           - Granularity: 1Minute
             Metrics:
               - GroupMinSize
               - GroupMaxSize
               - GroupDesiredCapacity
               - GroupInServiceInstances
               - GroupTotalInstances
               - GroupInServiceCapacity
               - GroupTotalCapacity
   ```

1. 在 https：//[https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/) 開啟 CloudFormation 主控台。

   使用 CloudFormation 主控台，使用上傳您建立之範本檔案的指示來建立堆疊。如需詳細資訊，請參閱*AWS CloudFormation 《 使用者指南*》中的[在 CloudFormation 主控台上建立堆疊](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html)。

**注意**  
連接至工作者 Amazon EC2 執行個體的 IAM 角色登入資料可供該工作者上執行*的所有*程序使用，其中包括任務。工作者應擁有最低操作權限： `deadline:CreateWorker`和 `deadline:AssumeFleetRoleForWorker.`
工作者代理程式會取得佇列角色的登入資料，並將其設定為執行任務來使用。Amazon EC2 執行個體描述檔角色不應包含任務所需的許可。

## 使用截止日期雲端擴展建議功能自動擴展 Amazon EC2 機群
<a name="autoscale-ec2-fleet"></a>

Deadline Cloud 利用 Amazon EC2 Auto Scaling (Auto Scaling) 群組自動擴展 Amazon EC2 客戶受管機群 (CMF)。您需要設定機群模式，並在帳戶中部署所需的基礎設施，讓機群自動擴展。您部署的基礎設施適用於所有機群，因此您只需要設定一次。

基本工作流程是：您可以將機群模式設定為自動擴展，然後當建議的機群大小變更 （一個事件包含機群 ID、建議的機群大小和其他中繼資料） 時，Deadline Cloud 會傳送該機群的 EventBridge 事件。您將擁有 EventBridge 規則來篩選相關事件，並擁有 Lambda 來使用它們。Lambda 將與 Amazon EC2 Auto Scaling 整合`AutoScalingGroup`，以自動擴展 Amazon EC2 機群。

### 將機群模式設定為 `EVENT_BASED_AUTO_SCALING`
<a name="set-fleet-mode"></a>

將您的機群模式設定為 `EVENT_BASED_AUTO_SCALING`。您可以使用 主控台來執行此操作，或使用 AWS CLI 直接呼叫 `CreateFleet`或 `UpdateFleet` API。設定模式之後，當建議的機群大小變更時，Deadline Cloud 會開始傳送 EventBridge 事件。
+ 範例`UpdateFleet`命令：

  ```
  aws deadline update-fleet \
    --farm-id FARM_ID \
    --fleet-id FLEET_ID \
    --configuration file://configuration.json
  ```
+ 範例`CreateFleet`命令：

  ```
  aws deadline create-fleet \
    --farm-id FARM_ID \
    --display-name "Fleet name" \
    --max-worker-count 10 \
    --configuration file://configuration.json
  ```

以下是上述 CLI 命令`configuration.json`中使用的範例 (`--configuration file://configuration.json`)。
+ 若要在機群上啟用 Auto Scaling，您應該將 模式設定為 `EVENT_BASED_AUTO_SCALING`。
+ `workerCapabilities` 是建立 CMF 時指派給 CMF 的預設值。如果您需要增加 CMF 可用的資源，您可以變更這些值。

 設定機群模式後，截止日期雲端會開始發出該機群的機群大小建議事件。

```
{
    "customerManaged": {
        "mode": "EVENT_BASED_AUTO_SCALING",       
        "workerCapabilities": {
            "vCpuCount": {
                "min": 1,
                "max": 4
            },
            "memoryMiB": {
                "min": 1024,
                "max": 4096
            },
            "osFamily": "linux",
            "cpuArchitectureType": "x86_64"
        }
    }
}
```

#### 使用 CloudFormation 範本部署 Auto Scaling 堆疊
<a name="deploy-stack"></a>

您可以設定 EventBridge 規則來篩選事件、設定 Lambda 來取用事件並控制 Auto Scaling，以及設定 SQS 佇列來存放未處理的事件。使用下列 CloudFormation 範本來部署堆疊中的所有項目。成功部署資源後，您可以提交任務，機群會自動擴展。

```
Resources:
  AutoScalingLambda:
    Type: 'AWS::Lambda::Function'
    Properties:
      Code:
        ZipFile: |-
          """
          This lambda is configured to handle "Fleet Size Recommendation Change"
          messages. It will handle all such events, and requires
          that the ASG is named based on the fleet id. It will scale up/down the fleet
          based on the recommended fleet size in the message.
          
          Example EventBridge message:
          {
              "version": "0",
              "id": "6a7e8feb-b491-4cf7-a9f1-bf3703467718",
              "detail-type": "Fleet Size Recommendation Change",
              "source": "aws.deadline",
              "account": "111122223333",
              "time": "2017-12-22T18:43:48Z",
              "region": "us-west-1",
              "resources": [],
              "detail": {
                  "farmId": "farm-12345678900000000000000000000000",
                  "fleetId": "fleet-12345678900000000000000000000000",
                  "oldFleetSize": 1,
                  "newFleetSize": 5,
              }
          }
          """
          
          import json
          import boto3
          import logging

          logger = logging.getLogger()
          logger.setLevel(logging.INFO)

          auto_scaling_client = boto3.client("autoscaling")

          def lambda_handler(event, context):
              logger.info(event)
              event_detail = event["detail"]
              fleet_id = event_detail["fleetId"]
              desired_capacity = event_detail["newFleetSize"]

              asg_name = f"deadline-ASG-autoscalable-{fleet_id}"
              auto_scaling_client.set_desired_capacity(
                  AutoScalingGroupName=asg_name,
                  DesiredCapacity=desired_capacity,
                  HonorCooldown=False,
              )

              return {
                  'statusCode': 200,
                  'body': json.dumps(f'Successfully set desired_capacity for {asg_name} to {desired_capacity}')
              }
      Handler: index.lambda_handler
      Role: !GetAtt 
        - AutoScalingLambdaServiceRole
        - Arn
      Runtime: python3.11
    DependsOn:
      - AutoScalingLambdaServiceRoleDefaultPolicy
      - AutoScalingLambdaServiceRole
  AutoScalingEventRule:
    Type: 'AWS::Events::Rule'
    Properties:
      EventPattern:
        source:
          - aws.deadline
        detail-type:
          - Fleet Size Recommendation Change
      State: ENABLED
      Targets:
        - Arn: !GetAtt 
            - AutoScalingLambda
            - Arn
          DeadLetterConfig:
            Arn: !GetAtt 
              - UnprocessedAutoScalingEventQueue
              - Arn
          Id: Target0
          RetryPolicy:
            MaximumRetryAttempts: 15
  AutoScalingEventRuleTargetPermission:
    Type: 'AWS::Lambda::Permission'
    Properties:
      Action: 'lambda:InvokeFunction'
      FunctionName: !GetAtt 
        - AutoScalingLambda
        - Arn
      Principal: events.amazonaws.com
      SourceArn: !GetAtt 
        - AutoScalingEventRule
        - Arn
  AutoScalingLambdaServiceRole:
    Type: 'AWS::IAM::Role'
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Action: 'sts:AssumeRole'
            Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
        Version: 2012-10-17
      ManagedPolicyArns:
        - !Join 
          - ''
          - - 'arn:'
            - !Ref 'AWS::Partition'
            - ':iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'
  AutoScalingLambdaServiceRoleDefaultPolicy:
    Type: 'AWS::IAM::Policy'
    Properties:
      PolicyDocument:
        Statement:
          - Action: 'autoscaling:SetDesiredCapacity'
            Effect: Allow
            Resource: '*'
        Version: 2012-10-17
      PolicyName: AutoScalingLambdaServiceRoleDefaultPolicy
      Roles:
        - !Ref AutoScalingLambdaServiceRole
  UnprocessedAutoScalingEventQueue:
    Type: 'AWS::SQS::Queue'
    Properties:
      QueueName: deadline-unprocessed-autoscaling-events
    UpdateReplacePolicy: Delete
    DeletionPolicy: Delete
  UnprocessedAutoScalingEventQueuePolicy:
    Type: 'AWS::SQS::QueuePolicy'
    Properties:
      PolicyDocument:
        Statement:
          - Action: 'sqs:SendMessage'
            Condition:
              ArnEquals:
                'aws:SourceArn': !GetAtt 
                  - AutoScalingEventRule
                  - Arn
            Effect: Allow
            Principal:
              Service: events.amazonaws.com
            Resource: !GetAtt 
              - UnprocessedAutoScalingEventQueue
              - Arn
        Version: 2012-10-17
      Queues:
        - !Ref UnprocessedAutoScalingEventQueue
```

## 執行機群運作狀態檢查
<a name="fleet-health-check"></a>

建立機群之後，您應該建立自訂運作狀態檢查，以確保您的機群保持運作狀態良好且沒有停滯的執行個體，以協助避免不必要的成本。請參閱[在 GitHub 上部署截止日期雲端機群運作狀態檢查](https://github.com/aws-deadline/deadline-cloud-samples/tree/mainline/cloudformation/farm_templates/cmf_templates)。 GitHub 運作狀態檢查可以降低意外變更 Amazon Machine Image、啟動範本或執行未偵測到的網路組態的風險。