将现有 CloudWatch 警报配置为创建 OpsItems(以编程方式) - AWS Systems Manager

将现有 CloudWatch 警报配置为创建 OpsItems(以编程方式)

您可以将 Amazon CloudWatch 警报配置为使用 AWS Command Line Interface(AWS CLI)、AWS CloudFormation 模板或 Java 代码段以编程方式创建 OpsItems。

开始前的准备工作

如果您以编程方式编辑现有警报或创建用于创建 OpsItems 的警报,则必须指定 Amazon 资源名称(ARN)。此 ARN 将 Systems Manager OpsCenter 标识为通过告警创建的 OpsItems 目标。您可以自定义 ARN,使通过告警创建的 OpsItems 包含特定信息,如严重性或类别。每个 ARN 包含下表中描述的信息。

参数 详细信息

Region(必填)

告警所在的 AWS 区域。例如:us-west-2。有关如何在 AWS 区域 中使用OpsCenter 的更多信息,请参阅 AWS Systems Manager 端点和配额

account_ID(必填)

创建告警时使用的同一 AWS 账户 ID。例如:123456789012。账户 ID 必须后跟冒号(:)和参数 opsitem,如以下示例所示。

severity(必填)

由用户定义的通过告警创建的 OpsItems 的严重性级别。有效值:1234

Category(可选)

通过告警创建的 OpsItems 的类别。有效值:AvailabilityCostPerformanceRecoverySecurity

使用以下句法创建 ARN。此 ARN 不包括可选的 Category 参数。

arn:aws:ssm:Region:account_ID:opsitem:severity

以下为示例。

arn:aws:ssm:us-west-2:123456789012:opsitem:3

要创建使用可选的 Category 参数的 ARN,请使用以下句法:

arn:aws:ssm:Region:account_ID:opsitem:severity#CATEGORY=category_name

以下为示例。

arn:aws:ssm:us-west-2:123456789012:opsitem:3#CATEGORY=Security

将 CloudWatch 警报配置为创建 OpsItems(AWS CLI)

此命令要求您为 alarm-actions 参数指定 ARN。有关如何创建 ARN 的信息,请参阅 开始前的准备工作

将 CloudWatch 警报配置为创建 OpsItems(AWS CLI)
  1. 安装并配置 AWS Command Line Interface(AWS CLI)(如果尚未执行该操作)。

    有关信息,请参阅安装或更新 AWS CLI 的最新版本

  2. 运行以下命令以收集要配置的告警的信息。

    aws cloudwatch describe-alarms --alarm-names "alarm name"
  3. 运行以下命令以更新告警。将每个示例资源占位符替换为您自己的信息。

    aws cloudwatch put-metric-alarm --alarm-name name \ --alarm-description "description" \ --metric-name name --namespace namespace \ --statistic statistic --period value --threshold value \ --comparison-operator value \ --dimensions "dimensions" --evaluation-periods value \ --alarm-actions arn:aws:ssm:Region:account_ID:opsitem:severity#CATEGORY=category_name \ --unit unit

    以下为示例。

    Linux & macOS
    aws cloudwatch put-metric-alarm --alarm-name cpu-mon \ --alarm-description "Alarm when CPU exceeds 70 percent" \ --metric-name CPUUtilization --namespace AWS/EC2 \ --statistic Average --period 300 --threshold 70 \ --comparison-operator GreaterThanThreshold \ --dimensions "Name=InstanceId,Value=i-12345678" --evaluation-periods 2 \ --alarm-actions arn:aws:ssm:us-east-1:123456789012:opsitem:3#CATEGORY=Security \ --unit Percent
    Windows
    aws cloudwatch put-metric-alarm --alarm-name cpu-mon ^ --alarm-description "Alarm when CPU exceeds 70 percent" ^ --metric-name CPUUtilization --namespace AWS/EC2 ^ --statistic Average --period 300 --threshold 70 ^ --comparison-operator GreaterThanThreshold ^ --dimensions "Name=InstanceId,Value=i-12345678" --evaluation-periods 2 ^ --alarm-actions arn:aws:ssm:us-east-1:123456789012:opsitem:3#CATEGORY=Security ^ --unit Percent

将 CloudWatch 警报配置为创建或更新 OpsItems(CloudFormation)

本部分包括多个 AWS CloudFormation 模板,您可以使用这些模板将 CloudWatch 警报配置为自动创建或更新 OpsItems。每个模板均要求您为 AlarmActions 参数指定 ARN。有关如何创建 ARN 的信息,请参阅 开始前的准备工作

指标警报 - 使用以下 CloudFormation 模板创建或更新 CloudWatch 指标警报。此模板中指定的警报用于监控 Amazon Elastic Compute Cloud(Amazon EC2)实例状态检查。如果告警进入 ALARM 状态,则在 OpsCenter 中创建 OpsItem。

{ "AWSTemplateFormatVersion": "2010-09-09", "Parameters" : { "RecoveryInstance" : { "Description" : "The EC2 instance ID to associate this alarm with.", "Type" : "AWS::EC2::Instance::Id" } }, "Resources": { "RecoveryTestAlarm": { "Type": "AWS::CloudWatch::Alarm", "Properties": { "AlarmDescription": "Run a recovery action when instance status check fails for 15 consecutive minutes.", "Namespace": "AWS/EC2" , "MetricName": "StatusCheckFailed_System", "Statistic": "Minimum", "Period": "60", "EvaluationPeriods": "15", "ComparisonOperator": "GreaterThanThreshold", "Threshold": "0", "AlarmActions": [ {"Fn::Join" : ["", ["arn:arn:aws:ssm:Region:account_ID:opsitem:severity#CATEGORY=category_name", { "Ref" : "AWS::Partition" }, ":ssm:", { "Ref" : "AWS::Region" }, { "Ref" : "AWS:: AccountId" }, ":opsitem:3" ]]} ], "Dimensions": [{"Name": "InstanceId","Value": {"Ref": "RecoveryInstance"}}] } } } }

复合警报 - 使用以下 CloudFormation 模板创建或更新复合警报。复合告警由多个指标告警组成。如果告警进入 ALARM 状态,则在 OpsCenter 中创建 OpsItem。

"Resources":{ "HighResourceUsage":{ "Type":"AWS::CloudWatch::CompositeAlarm", "Properties":{ "AlarmName":"HighResourceUsage", "AlarmRule":"(ALARM(HighCPUUsage) OR ALARM(HighMemoryUsage)) AND NOT ALARM(DeploymentInProgress)", "AlarmActions":"arn:aws:ssm:Region:account_ID:opsitem:severity#CATEGORY=category_name", "AlarmDescription":"Indicates that the system resource usage is high while no known deployment is in progress" }, "DependsOn":[ "DeploymentInProgress", "HighCPUUsage", "HighMemoryUsage" ] }, "DeploymentInProgress":{ "Type":"AWS::CloudWatch::CompositeAlarm", "Properties":{ "AlarmName":"DeploymentInProgress", "AlarmRule":"FALSE", "AlarmDescription":"Manually updated to TRUE/FALSE to disable other alarms" } }, "HighCPUUsage":{ "Type":"AWS::CloudWatch::Alarm", "Properties":{ "AlarmDescription":"CPUusageishigh", "AlarmName":"HighCPUUsage", "ComparisonOperator":"GreaterThanThreshold", "EvaluationPeriods":1, "MetricName":"CPUUsage", "Namespace":"CustomNamespace", "Period":60, "Statistic":"Average", "Threshold":70, "TreatMissingData":"notBreaching" } }, "HighMemoryUsage":{ "Type":"AWS::CloudWatch::Alarm", "Properties":{ "AlarmDescription":"Memoryusageishigh", "AlarmName":"HighMemoryUsage", "ComparisonOperator":"GreaterThanThreshold", "EvaluationPeriods":1, "MetricName":"MemoryUsage", "Namespace":"CustomNamespace", "Period":60, "Statistic":"Average", "Threshold":65, "TreatMissingData":"breaching" } } }

将 CloudWatch 警报配置为创建或更新 OpsItems(Java)

本部分包括多个 Java 代码段,您可以使用这些代码段将 CloudWatch 警报配置为自动创建或更新 OpsItems。每个代码段均要求您为 validSsmActionStr 参数指定 ARN。有关如何创建 ARN 的信息,请参阅 开始前的准备工作

特定警报 - 使用以下 Java 代码段创建或更新 CloudWatch 警报。此模板中指定的告警用于监控 Amazon EC2 实例状态检查。如果告警进入 ALARM 状态,则在 OpsCenter 中创建 OpsItem。

import com.amazonaws.services.cloudwatch.AmazonCloudWatch; import com.amazonaws.services.cloudwatch.AmazonCloudWatchClientBuilder; import com.amazonaws.services.cloudwatch.model.ComparisonOperator; import com.amazonaws.services.cloudwatch.model.Dimension; import com.amazonaws.services.cloudwatch.model.PutMetricAlarmRequest; import com.amazonaws.services.cloudwatch.model.PutMetricAlarmResult; import com.amazonaws.services.cloudwatch.model.StandardUnit; import com.amazonaws.services.cloudwatch.model.Statistic; private void putMetricAlarmWithSsmAction() { final AmazonCloudWatch cw = AmazonCloudWatchClientBuilder.defaultClient(); Dimension dimension = new Dimension() .withName("InstanceId") .withValue(instanceId); String validSsmActionStr = "arn:aws:ssm:Region:account_ID:opsitem:severity#CATEGORY=category_name"; PutMetricAlarmRequest request = new PutMetricAlarmRequest() .withAlarmName(alarmName) .withComparisonOperator( ComparisonOperator.GreaterThanThreshold) .withEvaluationPeriods(1) .withMetricName("CPUUtilization") .withNamespace("AWS/EC2") .withPeriod(60) .withStatistic(Statistic.Average) .withThreshold(70.0) .withActionsEnabled(false) .withAlarmDescription( "Alarm when server CPU utilization exceeds 70%") .withUnit(StandardUnit.Seconds) .withDimensions(dimension) .withAlarmActions(validSsmActionStr); PutMetricAlarmResult response = cw.putMetricAlarm(request); }

更新所有警报 - 使用以下 Java 代码段更新您的 AWS 账户 中的所有 CloudWatch 警报,以便在警报进入 ALARM 状态时创建 OpsItems。

import com.amazonaws.services.cloudwatch.AmazonCloudWatch; import com.amazonaws.services.cloudwatch.AmazonCloudWatchClientBuilder; import com.amazonaws.services.cloudwatch.model.DescribeAlarmsRequest; import com.amazonaws.services.cloudwatch.model.DescribeAlarmsResult; import com.amazonaws.services.cloudwatch.model.MetricAlarm; private void listMetricAlarmsAndAddSsmAction() { final AmazonCloudWatch cw = AmazonCloudWatchClientBuilder.defaultClient(); boolean done = false; DescribeAlarmsRequest request = new DescribeAlarmsRequest(); String validSsmActionStr = "arn:aws:ssm:Region:account_ID:opsitem:severity#CATEGORY=category_name"; while(!done) { DescribeAlarmsResult response = cw.describeAlarms(request); for(MetricAlarm alarm : response.getMetricAlarms()) { // assuming there are no alarm actions added for the metric alarm alarm.setAlarmActions(ImmutableList.of(validSsmActionStr)); } request.setNextToken(response.getNextToken()); if(response.getNextToken() == null) { done = true; } } }