# CloudWatch investigations
<a name="Investigations"></a>

The CloudWatch investigations feature is a generative AI-powered assistant that can help you respond to incidents in your system. It uses generative AI to scan your system's telemetry and quickly surface telemetry data and suggestions that might be related to your issue. These suggestions include metrics, logs, deployment events, and root-cause hypotheses with visual representations when multiple resources are involved. For a complete list of types of data that the AI assistant can surface, see [Insights that CloudWatch investigations can surface in investigations](Investigations-SuggestionTypes.md). 

You can conduct investigations without any additional configuration in CloudWatch operational troubleshooting. When you start an investigation, CloudWatch investigations uses the permissions associated with the signed-in user to investigate and analyze the resources associated with the alarm, metrics, or Logs Insights query and provide troubleshooting suggestions. No resources are created by the investigation, and every action taken by CloudWatch investigations is logged in CloudTrail for traceability. The investigation provides you the following information to assist you with operational troubleshooting:
+ View AI-generated observations, suggestions, and hypotheses
+ Access visual representations of multi-resource hypotheses
+ Review natural language explanations and root cause analysis
+ Access AI analysis of telemetry data, including metrics, logs, deployment events, AWS Health events, CloudTrail change events, X-Ray trace data, and CloudWatch Logs Insights queries

  Configuring CloudWatch investigations will provide you with more in depth investigations. 

 When you configure CloudWatch investigations, your investigations have the following additional capabilities: 
+ Accept or discard suggestions and observations

  For each suggestion, you decide whether to add it to the investigation findings or to discard it. This helps CloudWatch investigations refine and iterate toward the root cause of the issue. CloudWatch investigations can help you find the root cause without having to manually identify and query multiple metrics and other sources of telemetry and events. A troubleshooting issue that would have taken hours of searching and switching between different consoles can be solved in a much shorter time.
+ Configure cross account access

  Use CloudWatch cross-account observability to enable investigation to collect data from other source accounts.
+ Add new telemetry sources to the investigation

  Adding data from CloudTrail event history helps CloudWatch investigations associate issues with change events. Adding X-Ray provides improved topology and application mapping. You can also add data from Application Signals to dive deeper into the health of your applications and services, combining that telemetry with the other telemetry sources. If you use Amazon EKS Clusters you can provide CloudWatch investigations access to your EKS resources, to provide more granular information about the cluster resources that might be involved in the issue under investigation. 
+ Add notes or comments to investigation findings

  Being able to provide additional context to investigation finding to give perspective during reporting or auditing.
+ Execute suggested runbook remediations

  CloudWatch investigations might suggest that you use an Automation runbook to attempt to automatically resolve the issue. Automation is a capability in Systems Manager, another AWS service. Automation runbooks define a series of steps, or actions, to be run on the resources that you select. Each runbook is designed to address a specific issue. 
+ Share investigation results with team members

  Without additional configuration investigation are linked with the signed-in user's session. Other user's can't view the investigation results or continue the investigation. After configuring CloudWatch investigations investigations are available to all users in the account that have been granted the required permissions.
+ End, archive, or reopen the investigation manually

  Before CloudWatch investigations is configured in your account investigations run once and then complete. Once CloudWatch investigations is configured, investigations can continue until resolved. After the issue is resolved, the investigation is archived. If you have resolved the issue, but the conditions that caused the investigation are still present, you can manually close the investigation. If the conditions re-emerge you can restart (or reopen) the investigation.
+ Investigation reporting

  When you complete an investigation, you can generate a comprehensive investigation report that automatically captures all investigation findings, timeline events, and recommended actions.

Configuration of CloudWatch investigations creates an investigation group in your account. Each account can have one investigation group with up to 2 concurrent active investigations in the investigation group. Each month, each account can create up to 150 enhanced investigations with AI analysis. Investigation groups are account-level configurations. When an investigation group is created in an account, it is used with all investigations started in the account.

**Note**  
When you configure CloudWatch investigations , CloudWatch will use the provided IAM role to periodically scan resources in your account for the purpose of mapping resources and telemetry. Some services like Lambda will invoke the KMS decrypt API on behalf of CloudWatch for certain API calls related to describing or listing resources. This background process is performed to ensure that the topology reflects the most recent state of the account and its dependencies. This refresh occurs regardless of whether there is an active investigation or not. 

**Topics**
+ [

# Methods to create an investigation
](creation-methods.md)
+ [

# Understanding hypothesis visualizations
](Investigations-HypothesisVisualization.md)
+ [

# How CloudWatch investigations finds data for suggestions
](data-usage-considerations.md)
+ [

# Costs associated with CloudWatch investigations
](investigations-costs.md)
+ [

# Insights that CloudWatch investigations can surface in investigations
](Investigations-SuggestionTypes.md)
+ [

# AWS services where investigations are supported
](Investigations-Services.md)
+ [

# Conduct an CloudWatch investigation without additional configuration
](Investigations-Ephemeral.md)
+ [

# Configure CloudWatch investigations
](Investigations-GetStarted.md)
+ [

# (Recommended) Best practices to enhance investigations
](Investigations-RecommendedServices.md)
+ [

# Investigate operational issues in your environment
](Investigations-Investigate.md)
+ [

# Cross-account investigations
](Investigations-cross-account.md)
+ [

# Generate incident reports
](Investigations-Incident-Reports.md)
+ [

# Integrations with other systems
](Investigations-Integrations.md)
+ [

# Security in CloudWatch investigations
](Investigations-Security.md)
+ [

# CloudWatch investigations data retention
](Investigations-Retention.md)
+ [

# Troubleshooting
](Investigations-Troubleshooting.md)

# Methods to create an investigation
<a name="creation-methods"></a>

You can create investigations in the following ways:
+ From within many AWS consoles. For example, you can start an investigation when viewing a CloudWatch metric or alarm in the CloudWatch console, or from a Lambda function's **Monitor** tab on its properties page.
+ By following a prompt in chat with CloudWatch investigations. You can start by asking questions like "Why is my Lambda function slow today?" or "What's wrong with my database?"
+ By configuring a CloudWatch alarm action to automatically start an investigation when the alarm goes into ALARM state. 

After you start an investigation with any of these methods, CloudWatch investigations scans your system to find telemetry that might be relevant to the situation, and also generates hypotheses based on what it finds. CloudWatch investigations surfaces both the telemetry data and the hypotheses. At any time after accepting a hypothesis, you can generate a comprehensive incident report that automatically captures the current investigation findings, timeline events, and recommended actions.

# Understanding hypothesis visualizations
<a name="Investigations-HypothesisVisualization"></a>

When CloudWatch investigations generates hypotheses that include multiple resources, the investigation view provides a visual representation of the causal relationships between those resources. This visual hypothesis view helps you quickly understand complex issues without reading lengthy text explanations.

The hypothesis visualization displays resources as nodes connected by the pathways identified by CloudWatch investigations. For example, if a hypothesis involves Lambda function A affecting DynamoDB table B, you'll see two nodes visualizing the relationship.

**Key features of hypothesis visualizations:**
+ **Resource nodes** - Each AWS resource mentioned in the hypothesis appears as a distinct node, labeled with the resource type and identifier.
+ **Connections** - Connections between nodes indicate the relationships that CloudWatch investigations has identified.
+ **Visual context** - The layout helps you understand the scope and complexity of multi-resource issues at a glance.

This visual representation is particularly valuable for:
+ Understanding distributed system failures that span multiple services
+ Identifying upstream and downstream impact relationships
+ Quickly assessing the scope of an issue before diving into detailed analysis

**Note**  
Hypothesis visualizations are automatically generated when CloudWatch investigations identifies causal relationships between multiple resources.

# How CloudWatch investigations finds data for suggestions
<a name="data-usage-considerations"></a>

CloudWatch investigations uses a wide range of data sources to determine dependency relationships and plan analysis paths, including telemetry data configurations, service configurations, and observed relationships. These dependency relationships are found more easily if you use CloudWatch Application Signals and AWS X-Ray. When Application Signals and X-Ray aren't available, CloudWatch investigations will attempt to infer dependency relationships through co-occurring telemetry anomalies.

While CloudWatch investigations will continue to analyze telemetry data and provide suggestions without these features enabled, we strongly recommend that you enable the services and features listed in [(Recommended) Best practices to enhance investigations](Investigations-RecommendedServices.md) for optimal quality and performance for CloudWatch investigations. 

**Important**  
To help CloudWatch investigations provide the most relevant information, we might use certain content from CloudWatch investigations, including but not limited to, questions that you ask CloudWatch investigations and its response, insights, user interactions, telemetry, and metadata for service improvements. Your trust and privacy, as well as the security of your content, is our highest priority. For more information, see [https://aws.amazon.com/service-terms/](https://aws.amazon.com/service-terms/) and [https://aws.amazon.com/ai/responsible-ai/policy/](https://aws.amazon.com/ai/responsible-ai/policy/).  
You can opt out of having your content collected to develop or improve the quality of CloudWatch investigations by creating an AI service opt-out policy for CloudWatch or AI Operations (aiops). For more information, see [AI services opt-out policies](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_ai-opt-out.html) in the AWS Organizations User Guide. 

# Costs associated with CloudWatch investigations
<a name="investigations-costs"></a>

CloudWatch investigations might incur AWS service usage including telemetry and resource queries and other API usage. While the majority of these will not be charged to your AWS bill, there are exceptions including, but not limited to:
+ CloudWatch APIs (`ListMetrics`, `GetDashboard`, `ListDashboards`, and `GetInsightRuleReport`)
+ X-Ray APIs (`GetServiceGraph`, `GetTraceSummaries`, and `BatchGetTraces`)
+ CloudWatch investigations also uses AWS Cloud Control APIs which might incur usage of AWS services such as Amazon Kinesis Data Streams and AWS Lambda.
+ Additionally, if you choose to integrate CloudWatch investigations in chat applications you might incur usage of Amazon Simple Notification Service.

 For usage of these services exceeding the AWS Free Tier, you will see charges on your AWS bill. These charges are expected to be minimal for normal usage of CloudWatch investigations. For more information, see [Amazon Kinesis Data Streams pricing](https://aws.amazon.com/kinesis/data-streams/pricing/), [AWS Lambda pricing for Automation](https://aws.amazon.com/lambda/pricing/), and [Amazon Simple Notification Service pricing](https://aws.amazon.com/sns/pricing/).

# Insights that CloudWatch investigations can surface in investigations
<a name="Investigations-SuggestionTypes"></a>

CloudWatch investigations can surface the following types of items and add them to the **Suggestions** tab of an investigation. For hypotheses involving multiple resources, visual diagrams may also be provided to illustrate causal relationships.
+ Hypotheses about root causes
+ CloudWatch alarms, including both metric alarms and composite alarms
+ CloudWatch metrics
+ AWS Health events
+ Change events logged in CloudTrail
+ X-Ray trace data
+ CloudWatch Logs Insights queries for log groups in the Standard log class
+ CloudWatch Contributor Insights data
+ CloudWatch Application Signals data
+ CloudWatch Database Insights data

# AWS services where investigations are supported
<a name="Investigations-Services"></a>

You can launch investigations from telemetry data (such as CloudWatch metrics, alarms, and logs), review generated anomaly signals, and explore hypotheses on investigations. CloudWatch investigations work best when helping you with automated troubleshooting guidance on the AWS services listed below:
+ Amazon API Gateway
+ AWS AppSync
+ Amazon Data Firehose
+ Amazon DynamoDB
+ Amazon EBS1
+ Amazon EC21
+ Amazon ECS on Amazon EC22
+ Amazon ECS on Fargate2
+ Amazon EKS3
+ Amazon Kinesis Data Streams
+ AWS Lambda
+ Amazon OpenSearch Service
+ Amazon RDS4
+ Amazon S3
+ Amazon SNS
+ Amazon SQS
+ AWS Step Functions

The list of services will continue to be expanded over time. CloudWatch investigations utilizes a wide range of data sources to determine dependency relationships and plan analysis paths, including telemetry data configurations, service configurations, and observed relationships through CloudWatch Application Signals and X-Ray. Where none of the above is available, CloudWatch investigations will attempt to infer dependency relationships through co-occurring telemetry anomalies.

**Best practice setup**

While CloudWatch investigations will continue to analyze telemetry data and provide suggestions without the following features enabled, for optimal quality and performance for CloudWatch investigations, we recommend you complete the following steps:
+ 1For both Amazon EC2 and Amazon EBS, update your CloudWatch agent to version 1.30049.1 or later. For more information, see [Collect metrics, logs, and traces using the CloudWatch agent](Install-CloudWatch-Agent.md).
+ 2For Amazon ECS, enable CloudWatch Container Insights. For more information, see [Container Insights](ContainerInsights.md).
+ 3For Amazon EKS, enable CloudWatch Container Insights and configure Amazon EKS Access Entries. For more information, see [Container Insights](ContainerInsights.md) and [Integration with Amazon EKS](EKS-Integration.md).
+ 4For Amazon RDS, enable CloudWatch Database Insights in the **Advanced mode**. For more information, see [Turning on the Advanced mode of Database Insights for Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_DatabaseInsights.TurningOnAdvanced.html) in the *Aurora User Guide*. 
+ Enable CloudWatch Application Signals and X-Ray. For more information, see [Application Signals](CloudWatch-Application-Monitoring-Sections.md) and [What is AWS X-Ray](https://docs.aws.amazon.com/xray/latest/devguide/aws-xray.html). 

# Conduct an CloudWatch investigation without additional configuration
<a name="Investigations-Ephemeral"></a>

You can conduct a CloudWatch investigations AI-powered root cause analysis without any additional configuration of your AWS account using the **Investigations** feature available in **Operational troubleshooting**. 

These investigations provide instant access to CloudWatch investigations capabilities from commonly used telemetry sources in the AWS console. This type of investigation is session-based, read-only, and automatically deleted after 24 hours.

The investigation pane provides you:
+ **AI-generated observations** - Telemetry that fans out from your initial observation you provided to locate additional supporting data that can lead to a potential root cause.
+ **Root cause hypotheses** - Potential explanations for what's causing the problem, including causal diagrams when multiple resources are involved
+ **Natural language explanations** - Easy-to-understand descriptions of findings and recommendations

You can use the suggestions from the investigation pane to help accelerate your troubleshooting process.

## To start an investigation
<a name="Investigations-Ephemeral-Start"></a>

1. Navigate to any supported telemetry source showing an issue you want to investigate, such as:
   + A CloudWatch metric showing unusual patterns
   + A CloudWatch alarm in ALARM state
   + A Lambda function's monitoring tab showing performance issues

1. In **Operational troubleshooting**, choose the **Investigate** tab, then choose **Start investigation**. 

1. Watch as CloudWatch investigations analyzes your telemetry and generates insights about potential root causes.

1. (Optional) To add additional telemetry sources or collaborate on the investigation with others, choose **Get started** in the information box at the bottom of the **Investigate** tab. You'll be guided through the investigation group configuration process. For more information, see [Configure CloudWatch investigations](Investigations-GetStarted.md) 
**Important**  
You must have the appropriate IAM permissions to create or access investigation groups. Users with read-only permissions will see information about requesting additional permissions from their administrators.

# Configure CloudWatch investigations
<a name="Investigations-GetStarted"></a>

Configuring CloudWatch investigations creates an investigation group that CloudWatch investigations will use to access telemetry and aggregate data related to the investigation. Before you create the investigation group, you can walk-through a sample investigation to get an overall idea of how they work.

**Topics**
+ [

# See a sample investigation
](Investigations-sample.md)
+ [

# Set up an investigation group
](Investigations-GetStarted-Group.md)
+ [

# Configure alarms to create investigations
](Investigations-configure-alarms.md)
+ [

# Integration with Amazon EKS
](EKS-Integration.md)

# See a sample investigation
<a name="Investigations-sample"></a>

If you'd like to see an investigation in action before you configure an investigation group for your account, you can walk through a sample investigation. The sample investigation doesn't use your data and doesn't make data calls or start API operations in your account.

**To view the sample investigation**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the left navigation pane, choose **AI Operations**, **Overview**.

1. Choose **Try a sample investigation**.

   The console displays the sample investigation, with suggestions and findings in the right pane. In each popup, choose **Next** to advance to the next part of the sample walkthrough.

# Set up an investigation group
<a name="Investigations-GetStarted-Group"></a>

To set up CloudWatch investigations in your account for use with an enhanced investigation, you create an *investigation group*. Creating an investigation group is a one-time setup task, after it's created it is used to conduct other investigations. Settings in the investigation group help you centrally manage the common properties of your investigations, such as the following:
+ Who can access the investigations
+ Cross-account investigation support to access resources in other accounts during the investigation
+ Whether investigation data is encrypted with a customer managed AWS Key Management Service key.
+ How long investigations and their data are retained by default.

You can have one investigation group per account. Each investigation in your account will be part of this investigation group.

To create an investigation group you must be signed in to an IAM principal that has the either the **AIOpsConsoleAdminPolicy** or the **AdministratorAccess** IAM policy attached, or to an account that has similar permissions.

**Note**  
To be able to choose the recommended option of creating a new IAM role for CloudWatch investigations operational investigations, you must be signed in to an IAM principal that has the `iam:CreateRole`, `iam:AttachRolePolicy`, and `iam:PutRolePolicy` permissions.

**Important**  
CloudWatch investigations uses *cross-Region inference* to distribute traffic across different AWS Regions. For more information, see [Cross-Region inference](Investigations-Security.md#cross-region-inference).

**To create an investigation group in your account**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the left navigation pane, choose **AI Operations**, **Configuration**.

1. Choose **Configure for this account**.

1. Optionally change the retention period for investigations. For more information about what the retention period governs, see [CloudWatch investigations data retention](Investigations-Retention.md).

1. (Optional) To encrypt your investigation data with a customer managed AWS KMS key, choose **Customize encryption settings** and follow the steps to create or specify a key to use. If you don't specify a customer managed key, CloudWatch investigations uses an AWS owned key for encryption. For more information, see [Encryption of investigation data](Investigations-Security.md#Investigations-KMS). 

1. Choose how to give CloudWatch investigations permissions to access resources. To be able to choose either of the first two options, you must be signed in to an IAM principal that has the `iam:CreateRole`, `iam:AttachRolePolicy`, and `iam:PutRolePolicy` permissions.

   1. (Recommended) Select **Auto-create a new role with default investigation permissions**. This role will be granted permissions using the AWS managed policies for AI Operations.For more information, see [User permissions for your CloudWatch investigations group](Investigations-Security.md#Investigations-Security-IAM).

   1. Create a new role yourself and then assign the policy templates. 

   1. Choose **Assign an existing role** if you already have a role with the permissions that you want to use.

      If you choose this option, you must make sure the role includes a trust policy that names `aiops.amazonaws.com` as the service principal. For more information about using service principals in trust policies, see [AWS service principals](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_principal.html#principal-services).

      We also recommend that you include a `Condition` section with the account number, to prevent a confused deputy situation. The following example trust policy illustrates both the service principal and the `Condition` section. 

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                      "Service": "aiops.amazonaws.com"
                  },
                  "Action": "sts:AssumeRole",
                  "Condition": {
                      "StringEquals": {
                          "aws:SourceAccount": "123456789012"
                      },
                      "ArnLike": {
                          "aws:SourceArn": "arn:aws:aiops:us-east-1:123456789012:*"
                      }
                  }
              }
          ]
      }
      ```

------

1. Choose **Create investigation group**, you can now create an investigation from an alarm, metric, or log insight.

Optionally, you can setup additional recommended configurations to enhance your experience.

1. In the left navigation pane, choose **AI Operations, Configuration**.

1. On the **Optional configuration** tab, choose the enhancements you want to add to CloudWatch investigations.

1. In **Configure cross account access** you can set this account as a monitoring account that collects data from other source accounts in your organization. For more information, see [Cross-account investigations](Investigations-cross-account.md). 

1. For **Enhanced integrations**, choose to allow CloudWatch investigations access to additional services in your system, to enable it to gather more data and be more useful.

   1. In the **Tags for application boundary detection** section, enter the existing custom tag keys for custom applications in your system. Resource tags help CloudWatch investigations narrow the search space when it is unable to discover definite relationships between resources. For example, to discover that an Amazon ECS service depends on an Amazon RDS database, CloudWatch investigations can discover this relationship using data sources such as X-Ray and CloudWatch Application Signals. However, if you haven't deployed these features, CloudWatch investigations will attempt to identify possible relationships. Tag boundaries can be used to narrow the resources that will be discovered by CloudWatch investigations in these cases.

      You don't need to enter tags created by myApplications or CloudFormation, because CloudWatch investigations can automatically detect those tags.

   1. CloudTrail records events about changes in your system including deployment events. These events can often be useful to CloudWatch investigations to create hypotheses about root causes of issues in your system. In the **CloudTrail for change event detection** section, you can give CloudWatch investigations some access to the events logged by AWS CloudTrail by enabling **Allow the assistant access to CloudTrail change events through the CloudTrail Event history**. For more information, see [Working with CloudTrail Event history](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/view-cloudtrail-events.html).

   1. The **X-Ray for topology mapping** and **Application Signals for health assessment** sections point out other AWS services that can help CloudWatch investigations find information. If you have deployed them and you have granted the **AIOpsAssistantPolicy** IAM policy to CloudWatch investigations, it will be able to access X-Ray and Application Signals telemetry.

      For more information about how these services help CloudWatch investigations, see [X-Ray](Investigations-RecommendedServices.md#Investigations-Xray) and [CloudWatch Application Signals](Investigations-RecommendedServices.md#Investigations-ApplicationSignals).

   1. If you use Amazon EKS, your CloudWatch investigations investigation group can utilize information directly from your Amazon EKS cluster once you set up access entries. For more information, see [Integration with Amazon EKS](EKS-Integration.md).

   1. If you use Amazon RDS, enable the Advanced mode of Database Insights on your database instances. Database Insights monitors database load and provides detailed performance analysis that helps CloudWatch investigations identify database-related issues during investigations. When Advanced Database Insights is enabled, CloudWatch investigations can automatically generate performance analysis reports that include detailed observations, metric anomalies, root cause analysis, and recommendations specific to your database workload. For more information about Database Insights and how to enable Advanced mode, see [Monitoring Amazon RDS databases with CloudWatch Database Insights](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_DatabaseInsights.html).

1.  You can integrate CloudWatch investigations with a *chat channel*. This makes it possible to receive notifications about an investigation through the chat channel. CloudWatch investigations support chat channels in the following applications:
   + Slack
   + Microsoft Teams

   If you want to integrate with a chat channel, we recommend that you complete some additional steps before enabling this enhancement in your investigation group. For more information, see [Integration with third-party chat systems](Investigations-Integrations.md#Investigations-Integrations-Chat).

   Then, perform the following steps to integrate with a chat channel in chat applications:
   + In the **Chat client integration** section, choose **Select SNS topic**.
   + Select the SNS topic to use for sending notifications about your investigations.

# Configure alarms to create investigations
<a name="Investigations-configure-alarms"></a>

You can configure an existing CloudWatch alarm to automatically create investigations in CloudWatch investigations. When the alarm enters the ALARM state, CloudWatch automatically creates a new investigation or adds to an existing investigation based on the deduplication string.

When configuring an alarm to automatically create investigations, you'll need to specify an Amazon Resource Name (ARN) in the alarm's actionArns. This ARN identifies the investigation group where alarm-triggered investigations will be created. You can optionally include a deduplication string in the ARN to group related alarms.

## ARN format and parameters
<a name="Investigations-arn-format"></a>

The ARN pattern for investigation group alarm actions follows this format:

```
arn:aws:aiops:region:account-id:investigation-group/investigation-group-identifier#DEDUPE_STRING=value
```

The following table describes each ARN component:


| Parameter | Description | 
| --- | --- | 
| region (required) | The AWS Region where your investigation group is located. For example: us-east-1. | 
| account-id (required) | Your 12-digit AWS account ID. For example: 123456789012. | 
| investigation-group-identifier (required) | The unique identifier of your investigation group. Fore example, sMwwg1IogXdvL7UZ | 
| DEDUPE\$1STRING=value (optional) | A deduplication string that groups related alarms into the same investigation. When multiple alarms use the same deduplication string, they contribute to a single investigation instead of creating separate ones. | 

**Example without deduplication string:**

```
arn:aws:aiops:us-east-1:123456789012:investigation-group/sMwwg1IogXdvL7UZ
```

**Example with deduplication string:**

```
arn:aws:aiops:us-east-1:123456789012:investigation-group/sMwwg1IogXdvL7UZ#DEDUPE_STRING=performance
```

### Benefits of deduplication strings
<a name="Investigations-deduplication-benefits"></a>

Deduplication strings help you organize related alarms and reduce investigation fragmentation. Use deduplication strings when:
+ **Multiple alarms monitor the same system** - CPU, memory, and disk alarms for the same EC2 instance can share a deduplication string to create one comprehensive investigation.
+ **Cascading failures occur** - When one issue triggers multiple related alarms, the same deduplication string prevents creating separate investigations for each symptom.
+ **You want to categorize by problem type** - Use descriptive strings like "performance", "connectivity", or "security" to group alarms by issue category.

Effective deduplication string examples:
+ `DEDUPE_STRING=webserver-performance` - Groups performance-related alarms for web servers
+ `DEDUPE_STRING=database-connectivity` - Groups database connection issues
+ `DEDUPE_STRING=instance-i-1234567890abcdef0` - Groups all alarms for a specific EC2 instance

**Note**  
If no deduplication string is specified, the system uses a default combination of alarm name, account ID, and region to group investigations.

For more information about investigation groups, see [Set up an investigation group](Investigations-GetStarted-Group.md).

# Configure an alarm to create investigations
<a name="Investigations-configure-alarm-procedures"></a>

After you have an investigation group set up in your account, you can configure existing CloudWatch alarms to automatically create investigations when they enter the ALARM state. This eliminates the need to manually start investigations and ensures consistent response to operational issues. You can configure alarms using the AWS Management Console, AWS CLI, CloudFormation, or AWS SDKs.

------
#### [ Console ]

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **Alarms**, and select an existing alarm.

1. Choose **Actions**, **Edit**.

1. In the **Alarm actions** section, choose **Add alarm action**.

1. Under the **Configure actions**, **Investigation action** section, choose the investigation group ARN.

1. (Optional) Add a deduplication string to group related alarms.

1. Choose **Update alarm**.

------
#### [ CLI ]

This command requires that you specify an ARN for the `alarm-actions` parameter. For information about how to create the ARN, see [ARN format and parameters](Investigations-configure-alarms.md#Investigations-arn-format).

**To configure a CloudWatch alarm with InvestigationGroup action (AWS CLI)**

1. Install and configure the AWS CLI, if you haven't already. For information, see [Installing or updating the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

1. Run the following command to collect information about the alarm that you want to configure.

   ```
   aws cloudwatch describe-alarms --alarm-names "alarm name"
   ```

1. Run the following command to update an alarm. Replace each *example resource placeholder* with your own information.

   ```
   aws cloudwatch put-metric-alarm --alarm-name name \
   --alarm-description "description" \
   --metric-name name --namespace namespace \
   --statistic statistic --period value --threshold value \
   --comparison-operator value \
   --dimensions "dimensions" --evaluation-periods value \
   --alarm-actions "arn:aws:aiops:region:{account-id}:investigation-group/{investigationGroupIdentifier}#DEDUPE_STRING={my-dedupe-string}"
   ```

   Here's an example.

   ```
   //Without deduplication string
   aws cloudwatch put-metric-alarm --alarm-name cpu-mon \
   --alarm-description "Alarm when CPU exceeds 70 percent" \
   --metric-name CPUUtilization --namespace AWS/EC2 \
   --statistic Average --period 300 --threshold 70 \
   --comparison-operator GreaterThanThreshold \
   --dimensions "Name=InstanceId,Value=i-12345678" --evaluation-periods 2 \
   --alarm-actions arn:aws:aiops:us-east-1:123456789012:investigation-group/sMwwg1IogXdvL7UZ \
   --unit Percent
   
   //With deduplication string
   aws cloudwatch put-metric-alarm --alarm-name cpu-mon \
   --alarm-description "Alarm when CPU exceeds 70 percent" \
   --metric-name CPUUtilization --namespace AWS/EC2 \
   --statistic Average --period 300 --threshold 70 \
   --comparison-operator GreaterThanThreshold \
   --dimensions "Name=InstanceId,Value=i-12345678" --evaluation-periods 2 \
   --alarm-actions arn:aws:aiops:us-east-1:123456789012:investigation-group/sMwwg1IogXdvL7UZ#DEDUPE_STRING=performance \
   --unit Percent
   ```

------
#### [ CloudFormation ]

This section includes CloudFormation templates that you can use to configure CloudWatch alarms to automatically create or update investigations. Each template requires that you specify an ARN for the `AlarmActions` parameter. For information about how to create the ARN, see [ARN format and parameters](Investigations-configure-alarms.md#Investigations-arn-format).

```
//Without deduplication string
Resources:
  MyAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmActions:
        - !Sub "arn:aws:aiops:${AWS::Region}:${AWS::AccountId}:investigation-group/{investigationGroupIdentifier}"

//With deduplication string
Resources:
  MyAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmActions:
        - !Sub "arn:aws:aiops:${AWS::Region}:${AWS::AccountId}:investigation-group/{investigationGroupIdentifier}#DEDUPE_STRING={my-dedupe-string}"
```

------
#### [ SDK ]

This section includes Java code snippets that you can use to configure CloudWatch alarms to automatically create or update investigations. Each snippet requires that you specify an ARN for the `investigationGroupArn` parameter. For information about how to create the ARN, see [ARN format and parameters](Investigations-configure-alarms.md#Investigations-arn-format).

```
import com.amazonaws.services.cloudwatch.AmazonCloudWatch;
import com.amazonaws.services.cloudwatch.AmazonCloudWatchClientBuilder;
import com.amazonaws.services.cloudwatch.model.ComparisonOperator;
import com.amazonaws.services.cloudwatch.model.Dimension;
import com.amazonaws.services.cloudwatch.model.PutMetricAlarmRequest;
import com.amazonaws.services.cloudwatch.model.PutMetricAlarmResult;
import com.amazonaws.services.cloudwatch.model.StandardUnit;
import com.amazonaws.services.cloudwatch.model.Statistic;

//Without deduplication string
private void putMetricAlarmWithCloudWatchInvestigationAction() {
        final AmazonCloudWatch cloudWatchClient =
                AmazonCloudWatchClientBuilder.defaultClient();
       
        Dimension dimension = new Dimension()
                .withName("InstanceId")
                .withValue("i-12345678");
        String investigationGroupArn = "arn:aws:aiops:us-east-1:123456789012:investigation-group/sMwwg1IogXdvL7UZ";
        
        PutMetricAlarmRequest request = new PutMetricAlarmRequest() 
                    .withAlarmName("cpu-mon")
                    .withComparisonOperator( 
                        ComparisonOperator.GreaterThanThreshold) 
                    .withEvaluationPeriods(2) 
                    .withMetricName("CPUUtilization") 
                    .withNamespace("AWS/EC2") 
                    .withPeriod(300) 
                    .withStatistic(Statistic.Average) 
                    .withThreshold(70.0) 
                    .withActionsEnabled(true) 
                    .withAlarmDescription("Alarm when CPU exceeds 70 percent") 
                    .withUnit(StandardUnit.Percent) 
                    .withDimensions(dimension) 
                    .withAlarmActions(investigationGroupArn);
          
        PutMetricAlarmResult response = cloudWatchClient.putMetricAlarm(request);
}

//With deduplication string
private void putMetricAlarmWithCloudWatchInvestigationActionWithDedupeString() {
        final AmazonCloudWatch cloudWatchClient =
                AmazonCloudWatchClientBuilder.defaultClient();
       
        Dimension dimension = new Dimension()
                .withName("InstanceId")
                .withValue("i-12345678");
        String investigationGroupArn = "arn:aws:aiops:us-east-1:123456789012:investigation-group/sMwwg1IogXdvL7UZ#DEDUPE_STRING=performance";
        
        PutMetricAlarmRequest request = new PutMetricAlarmRequest() 
                    .withAlarmName("cpu-mon")
                    .withComparisonOperator( 
                        ComparisonOperator.GreaterThanThreshold) 
                    .withEvaluationPeriods(2) 
                    .withMetricName("CPUUtilization") 
                    .withNamespace("AWS/EC2") 
                    .withPeriod(300) 
                    .withStatistic(Statistic.Average) 
                    .withThreshold(70.0) 
                    .withActionsEnabled(true) 
                    .withAlarmDescription("Alarm when CPU exceeds 70 percent") 
                    .withUnit(StandardUnit.Percent) 
                    .withDimensions(dimension) 
                    .withAlarmActions(investigationGroupArn);
          
        PutMetricAlarmResult response = cloudWatchClient.putMetricAlarm(request);
}
```

------

# Integration with Amazon EKS
<a name="EKS-Integration"></a>

CloudWatch investigations investigation groups can utilize information directly from your Amazon EKS cluster. To get started, first grant access to the `Investigation Group` IAM role. We recommend using the default AWS managed *access policy* `AmazonAIOpsAssistantPolicy` that grants CloudWatch investigations investigation groups access to resources in the cluster. By using this policy you will automatically get policy updates as needed.

**Note**  
`AmazonAIOpsAssistantPolicy` is an access policy. The AWS managed identity policy that authorizes the access associated with CloudWatch investigations investigation groups is [https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AIOpsAssistantPolicy.html](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AIOpsAssistantPolicy.html).

Use the **Advanced Configuration** option to scope down the access provided by the access policy to a set of namespaces or the entire cluster. Alternatively, you can further scope access down by associating the access entry to a Kubernetes group RBAC permission. For more information, see [Creating access entries](https://docs.aws.amazon.com/eks/latest/userguide/creating-access-entries.html).

## Configuring the Amazon EKS access entry (Console)
<a name="EKS-Access-Entries-Console"></a>

To associate the `AmazonAIOpsAssistantPolicy` to the investigation role using the AWS Management Console, follow these steps:

1. Open the CloudWatch console and navigate to the Investigations Configuration page.

1. In the Amazon EKS Access section, select the option to associate the `AmazonAIOpsAssistantPolicy` with your investigation role.

1. Review the policy details and confirm the association.

To further customize the access scope:

1. Click **Advanced Configuration** in the Amazon EKS Access section.

1. You will be redirected to the Amazon EKS console.

1. In the Amazon EKS console, you can:

   1. Scope the policy to specific namespaces

   1. Configure the group feature for more granular access control

## Configuring Amazon EKS Access Entries (CDK)
<a name="EKS-Access-Entries-CDK"></a>

To configure Amazon EKS Access Entries using the AWS CDK, use the following code example:

```
    const testAccessEntry = new AccessEntry(this, `test-access-entry`, {
        cluster: eksCluster,
        principal: investigationsIamRole.roleArn,
        accessPolicies: [
            AccessPolicy.fromAccessPolicyName('AmazonAIOpsAssistantPolicy', {
                accessScopeType: AccessScopeType.CLUSTER
            }),
        ],
    });
```

## AmazonAIOpsAssistantPolicy
<a name="AmazonAIOpsAssistantPolicy"></a>

The Amazon EKS Access Policy, `AmazonAIOpsAssistantPolicy`, provides comprehensive Read Only access to resources in the cluster. Information from each resource may not be currently utilized by CloudWatch investigations.

```
    - apiGroups: [""]
      resources:
        - pods
        - pods/log
        - services
        - nodes
        - namespaces
        - events
        - persistentvolumes
        - persistentvolumeclaims
        - configmaps
      verbs:
        - get
        - list

    - apiGroups: ["apps"]
      resources:
        - deployments
        - replicasets
        - statefulsets
        - daemonsets
      verbs:
        - get
        - list

    - apiGroups: ["batch"]
      resources:
        - jobs
        - cronjobs
      verbs:
        - get
        - list

    - apiGroups: ["events.k8s.io"]
      resources:
        - events
      verbs:
        - get
        - list

    - apiGroups: ["networking.k8s.io"]
      resources:
        - ingresses
        - ingressclasses
      verbs:
        - get
        - list

    - apiGroups: ["storage.k8s.io"]
      resources:
        - storageclasses
      verbs:
        - get
        - list

    - apiGroups: ["metrics.k8s.io"]
      resources:
        - pods
        - nodes
      verbs:
        - get
        - list
```

## Updates to AmazonAIOpsAssistantPolicy
<a name="AmazonAIOpsAssistantPolicy-Updates"></a>


| Change | Description | Date | 
| --- | --- | --- | 
| Add policy for CloudWatch investigations | Initial release of AmazonAIOpsAssistantPolicy | August 9, 2025 | 

# (Recommended) Best practices to enhance investigations
<a name="Investigations-RecommendedServices"></a>

As a best practice, we recommend that you enable several AWS services and features in your account that can help CloudWatch investigations discover more information in your topology and make better suggestions during investigations.

**Topics**
+ [

## CloudWatch agent
](#Investigations-CloudWatchAgent)
+ [

## AWS CloudTrail
](#Investigations-CloudTrail)
+ [

## CloudWatch Application Signals
](#Investigations-ApplicationSignals)
+ [

## X-Ray
](#Investigations-Xray)
+ [

## Container insights
](#Investigations-ContainerInsights)
+ [

## Database insights
](#Investigations-DatabaseInsights)

## CloudWatch agent
<a name="Investigations-CloudWatchAgent"></a>

We recommend that you install the latest version of the CloudWatch agent on your servers. Using a recent version of the CloudWatch agent enhances the ability to find issues in Amazon EC2 and Amazon EBS during investigations. At a minimum, you should use Version 1.300049.1 or later of the CloudWatch agent. For more information about the CloudWatch agent and how to install it, see [Collect metrics, logs, and traces using the CloudWatch agent](Install-CloudWatch-Agent.md).

## AWS CloudTrail
<a name="Investigations-CloudTrail"></a>

We recommend that you configure your CloudTrail trails to send management events to CloudWatch Logs. CloudTrail records management events about control plane operations in your AWS account, such as configuring permissions policies and creating, modifying, and updating resources. When CloudTrail events are sent to CloudWatch Logs, CloudWatch investigations can analyze these events in your AWS account to detect changes in your account related to your investigation. For more information, see [What is AWS CloudTrail](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html), [Creating a trail with the CloudTrail console](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-create-and-update-a-trail.html), [Working with CloudTrail trails](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-trails.html) and [Sending events to CloudWatch Logs](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/send-cloudtrail-events-to-cloudwatch-logs.html).

**Note**  
CloudWatch investigations can only analyze CloudTrail events from trails that send events to CloudWatch Logs in the same AWS Region where you conduct the investigation. For cross-account investigations, it will review the CloudTrail trails in each configured account in the same region where you conduct the investigation.

## CloudWatch Application Signals
<a name="Investigations-ApplicationSignals"></a>

CloudWatch Application Signals discovers the topology of your environment, including your applications and their dependencies. It also automatically collects standard metrics such as latency and availability. By enabling Application Signals, CloudWatch investigations can use this topology and metric information during investigations. 

For more information about application signals, see [Application Signals](CloudWatch-Application-Monitoring-Sections.md).

## X-Ray
<a name="Investigations-Xray"></a>

We recommend that you enable AWS X-Ray. X-Ray collects traces about requests that your applications serve. For any traced request to your application, you can see detailed information not only about the request and response, but also about calls that your application makes to downstream AWS resources, microservices, databases, and web APIs. This information can help CloudWatch investigations during investigations.

For more information, see [https://docs.aws.amazon.com/xray/latest/devguide/aws-xray.html](https://docs.aws.amazon.com/xray/latest/devguide/aws-xray.html)

## Container insights
<a name="Investigations-ContainerInsights"></a>

If you use Amazon ECS or Amazon EKS, we recommend that you install Container insights. This improves the ability of CloudWatch investigations to find issues in your containers. For more information about the CloudWatch agent and how to install it, see [Container Insights](ContainerInsights.md).

## Database insights
<a name="Investigations-DatabaseInsights"></a>

If you use Amazon RDS, we recommend that you enable the Advanced mode of Database Insights on your database instances. Database Insights monitors database load and provides detailed performance analysis that helps CloudWatch investigations identify database-related issues during investigations. When Advanced Database Insights is enabled, CloudWatch investigations can automatically generate performance analysis reports that include detailed observations, metric anomalies, root cause analysis, and recommendations specific to your database workload. For more information about Database Insights and how to enable Advanced mode, see [Monitoring Amazon RDS databases with CloudWatch Database Insights](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_DatabaseInsights.html).

# Investigate operational issues in your environment
<a name="Investigations-Investigate"></a>

You can create investigations in several ways depending on your workflow and the source of the issue you're investigating. Once an investigation is active, you can review AI-generated suggestions, accept or discard findings, and take remediation actions through automated runbooks.

The following procedures show you how to start investigations from different entry points and how to work with active investigations:

**Contents**
+ [

# Create an investigation
](Investigations-CreateInvestigation.md)
  + [

## Create an investigation from Amazon Q chat
](Investigations-CreateInvestigation.md#Investigations-CreateInvestigation-QChat)
  + [

## Create an investigation from a CloudWatch alarm action
](Investigations-CreateInvestigation.md#Investigations-CreateInvestigation-AlarmAction)
+ [

# Create an investigation from a CloudWatch Application Signals Service Level Objective (SLO)
](Investigations-CreateInvestigation-SLO.md)
+ [

# View and continue an open investigation
](Investigations-Continue.md)
+ [

# Reviewing and executing suggested runbook remediations for CloudWatch investigations
](suggested-investigation-actions.md)
+ [

# Manage your current investigations
](Investigations-Manage.md)
+ [

# Restart an archived investigation
](Investigations-Restart.md)

# Create an investigation
<a name="Investigations-CreateInvestigation"></a>

You can start an investigation from several AWS consoles, including (but not limited to) CloudWatch alarm pages, CloudWatch metric pages, and Lambda monitoring pages.

**To start an investigation from an AWS console page**

1. In **Account-level** select the graph of the metric or alarm that you want to investigate.

1. If the top of the page has an **Investigate** button, choose it and then choose **Start new investigation**.

   Otherwise, choose the vertical ellipsis menu icon ![\[Depicts the appearance of the vertical ellipsis icon on the console\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/vmore.png) for the metric, and choose **Investigate**, **Start a new investigation**.

1. In the **Investigation** pane, enter a name for the investigation in **New investigation title**, and optionally enter notes about the selected metric or alarm. 

1. Under **Approximate impact start time** CloudWatch investigations recommends a timestamp to investigate based on the selected telemetry. To change the timestamp of the investigation, update the date and time. 

1. Then choose **Start investigation**.

   The investigation starts. CloudWatch investigations scans your telemetry data to find data that might be associated with this situation.

1. To move the investigation data to the larger pane, choose **Open in full page**.

1. For detailed instructions about steps that you can take while continuing the investigation, see [View and continue an open investigation](Investigations-Continue.md).

## Create an investigation from Amazon Q chat
<a name="Investigations-CreateInvestigation-QChat"></a>

You can ask questions about issues in your deployment in CloudWatch investigations chat. The question could be something like "Why is my Lambda function slow today?"

When you do so, CloudWatch investigations might ask follow up questions and run a health check regarding the issue. After the health check, the chat will prompt you about whether you want to start an investigation.

For more information and more sample questions, see [Chatting with Amazon Q about AWS](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/chat-with-q.html#example-questions-investigations). 

For detailed instructions about steps that you can take while continuing the investigation after it has been started, see [View and continue an open investigation](Investigations-Continue.md).

## Create an investigation from a CloudWatch alarm action
<a name="Investigations-CreateInvestigation-AlarmAction"></a>

When you create a CloudWatch alarm, you can specify for it to automatically start an investigation when it goes into ALARM state. You can do this for both metric alarms and composite alarms. For more information, see [Start a CloudWatch investigations from an alarm](Start-Investigation-Alarm.md), [Create a CloudWatch alarm based on a static threshold](ConsoleAlarms.md) and [Create a composite alarm](Create_Composite_Alarm.md).

# Create an investigation from a CloudWatch Application Signals Service Level Objective (SLO)
<a name="Investigations-CreateInvestigation-SLO"></a>

You can start an investigation from a CloudWatch Application Signals Service Level Objective (SLO) metric.

**To start an investigation from a CloudWatch Application Signals Service Level Objective (SLO)**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. Navigate to the **Applications Signals (APM)**, **Service Level Objectives (SLO)** console page.

1. Select an entry from the **Service Level Objectives (SLO)** list to display the metrics available for that SLO.

1. Select a metric, then choose **Investigate** from the **Action** menu.

   Alternatively, in the visualization of the metric you want to investigate, next to the more ![\[Vertical ellipsis used to display more options.\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/vmore.png) menu, select the AI ![\[Icon used to represent a feature that uses artificial intelligence .\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/cw-ai-icon.png) icon to start an investigation.
**Note**  
If you have not configured operational investigations in your account, the AI icon opens the **Operation troubleshooting** pane. Select **Get started** to configure an investigation group and then continue.

1. In the **Operational troubleshooting** pane on the **Investigate**, under **Investigation title** enter a name for the investigation and optionally enter notes about the selected metric. 

1. Under **Approximate impact start time** CloudWatch investigations recommends a timestamp to investigate based on the selected telemetry. To change the timestamp of the investigation, update the date and time. 

1. Then choose **Start investigation**.

   The investigation starts. CloudWatch investigations scans your telemetry data to find data that might be associated with this situation.

1. To move the investigation data to the larger pane, choose **Open in full page**.

1. For detailed instructions about steps that you can take while continuing the investigation, see [View and continue an open investigation](Investigations-Continue.md).

# View and continue an open investigation
<a name="Investigations-Continue"></a>

Use the steps in this section to view and continue and existing investigation

**To view and continue an investigation**

1. If you aren't already on the page for the investigation, do the following:

   1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

   1. In the left navigation pane, choose **AI Operations**, **Investigations**.

   1. Choose the name of the investigation.

1. The **Feed** section displays the items that have been added to the investigation findings, including the metric or alarm that was originally selected to start the investigation with.

   The pane on the right includes tabs. Choose the **Suggestions** tab.

1. The **Suggestions** tab displays *observations* of other telemetry that CloudWatch investigations has found that might be related to the investigation. It might also include *hypotheses*, which are possible reasons or root causes that CloudWatch investigations has found for the situation.

   Both observations and hypotheses are written in natural language by CloudWatch investigations.

   You have several options:
   + For each suggestion, you can choose **Accept** or **Discard**.

     When you choose **Accept**, the suggestion is added to the **Feed** section, and CloudWatch investigations uses this information to direct further scanning and suggestions.

     If you choose **Discard**, the suggestion is moved to the **Discarded** tab.
   + For each observation-type suggestion, you can choose to expand the graph in the **Suggestions** tab, or open it in the CloudWatch console to see more details about it.
   + Some of the observations might be results of CloudWatch Logs Insights queries that CloudWatch investigations ran as part of the investigation. When an observation is a CloudWatch Logs Insights query result, the query itself is displayed as part of the observation. You can edit the query and re-run it. To do so, choose the vertical ellipsis menu icon ![\[An example of a CloudWatch overview home page, showing alarms and their current state, and examples of other metrics graph widgets that might appear on the overview home page.\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/vmore.png) by the results, and then choose **Open in Logs Insights**. For more information, see [Analyzing log data with CloudWatch Logs Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html).
   + If you know of telemetry in an AWS service that might apply to this investigation, you can go to that service's console and add the telemetry to the investigation. For example, to add a Lambda metric to the investigation, you can do the following:

     1. Open the Lambda console.

     1. In the **Monitor** section, find the metric.

     1. Open the vertical ellipsis context menu ![\[An example of a CloudWatch overview home page, showing alarms and their current state, and examples of other metrics graph widgets that might appear on the overview home page.\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/vmore.png) for the metric, choose **Investigate**, **Add to investigation** Then, in the **Investigate** pane, select the name of the investigation.
   + When you view a hypothesis in the **Suggestions** tab, you can choose **Show reasoning** to display the data that CloudWatch investigations used to generate the hypothesis. For hypotheses involving multiple resources, you may also see a visual representation showing the causal relationships between the resources as connected nodes.
   + You can choose the **Discarded** tab and view the suggestions that have been previously discarded. To add one of them to the findings, choose **Restore to findings**.
   + To add notes to the findings, choose **New note** in the **Feed** pane. Then enter your notes and choose **Add**.

1. When you add a hypothesis to the **Feed** area, it might display **Show suggested actions**. If so, choosing this displays possible actions that you can take, assuming that hypothesis is correct about the issue. Possible actions include the following:
   + **Documentation suggestions** are links to AWS documentation that can help you understand the issue that you are working on, and how to solve it. To view suggested documentation, choose its **Review** link
   + **Runbook suggestions** are suggestions that leverage the pre-defined *runbooks* in Systems Manager Automation. Each runbook defines a number of steps for performing a task on an AWS resource. For information about continuing with a runbook action, see [Reviewing and executing suggested runbook remediations for CloudWatch investigations](suggested-investigation-actions.md).
**Important**  
There is a charge for executing an Automation runbook. However, CloudWatch investigations provides you with a preview of actions that a suggested runbook takes, giving you an opportunity to better evaluate whether to execute the runbook. For information about Automation pricing, see [AWS Systems Manager pricing for Automation](https://aws.amazon.com/systems-manager/pricing/#Automation).

1. (Optional) Choose **Incident report** to create a comprehensive incident analysis document. For more information, see [Generate incident reports](Investigations-Incident-Reports.md).

1. When you are ready to end an investigation, choose **End investigation** and then optionally add final notes. The investigation status changes to **Archived**. You can restart archived investigations by opening the investigation page and choosing **Restart investigation**.

**Note**  
At some point, you might see **Completed the analysis. Finished with the investigation.** displayed above the **Feed** area. If you then add more telemetry to the findings, this message changes and CloudWatch investigations begins scanning your telemetry again, based on the new data that you added to the findings.

# Reviewing and executing suggested runbook remediations for CloudWatch investigations
<a name="suggested-investigation-actions"></a>

When you add a hypothesis to the **Feed** area of an active investigation, CloudWatch investigations might display **Show suggested actions**. One suggested action might be to view documentation with information to help you remediate a problem manually.

Another suggestion might be to use an *Automation runbook* to attempt to automatically resolve the issue. Automation is a capability in Systems Manager, another AWS service. Automation runbooks define a series of steps, or actions, to be run on the resources that you select. Each runbook is designed to address a specific issue. Runbooks can address a variety of operational needs: Creating, repairing, reconfiguring, installing, troubleshooting, remediating, duplicating, and more. For more information about Automation, see [Integration with AWS Systems Manager Automation](Investigations-Integrations.md#Investigations-Integrations-SSM).

**Before you begin**  
Before working with Automation runbooks in an investigation, be aware of the following important considerations:
+ Choosing to execute a runbook incurs charges. For information, see [AWS Systems Manager pricing](https://aws.amazon.com/systems-manager/pricing/#Automation). 
+ Root causes and runbook suggestions are powered by automated reasoning and generative artificial intelligence services.
**Important**  
You are responsible for actions that result from executing runbook steps and the choice of parameter values entered during runbook execution. You might need to edit the suggested runbook to make sure the runbook performs as expected. For more information, see [https://aws.amazon.com/ai/responsible-ai/policy/](https://aws.amazon.com/ai/responsible-ai/policy/).
+ Depending on the runbook, you might need to enter values for the runbook's **Input parameters** before the execution can run.
+ The runbook executes using the IAM permissions assigned to the operator. If necessary, sign in with different IAM permissions to execute the runbook. In addition to permissions for the actions being taken, you'll need additional Systems Manager permissions to execute runbook steps. For more information, see [Setting up Automation](https://docs.aws.amazon.com//systems-manager/latest/userguide/automation-setup.html) in the *AWS Systems Manager User Guide*.

**To review and execute suggested runbook actions for CloudWatch investigations**

1. To view information about a suggested runbook, choose **Review** for information about how to execute the runbook steps. 

   On the investigation details page, choose **Suggestions**. 

1. In the **Suggestions** pane, review the list of hypotheses based on the system's analysis of the issue under investigation.

   For each hypothesis, you can choose from the following options:
   + **Show reasoning** – View more information about why the system has generated the hypothesis.
   + **View actions** – View the suggested actions for the issue. Not all hypotheses will include suggested actions.
   + **Accept** – Accept the hypothesis and add it to the investigation's **Feed** section.
**Note**  
Accepting the hypothesis doesn't automatically run the associated runbook solution. You can view suggested runbooks before accepting a hypothesis, but you must accept the hypothesis to execute a runbook.
   + **Discard** – Reject the hypothesis and don't engage with it any further.

1. After you choose **View action**, in the **Suggested actions** pane, review the list of suggested actions you can take to address the issue. Suggested actions can include one or more of the following:
   + **AWS knowledge articles** – Provides information about steps you can take to manually address the issue, plus a link to more information.
   + **AWS documentation** – Provides links to user documentation topics related to the issue.
   + **AWS-owned runbooks** – Lists one or more Automation runbooks that are managed by AWS that you can run to attempt issue resolution.
   + **Runbooks owned by you** – Lists one or more custom Automation runbooks created by you or someone else in your account or organization, which you can run to attempt issue resolution. 
**Note**  
The system automatically generates this list of runbooks by evaluating keywords in your custom runbooks and then comparing them to terms related to the issue being investigated.   
More keyword matches mean a particular custom runbook appears higher in the **Runbooks owned by you** list.

1. After reviewing the hypothesis, you can examine a specific suggested action further and read related documentation by choosing **Learn more**. You can also choose **Review details** to inspect suggested runbooks owned by AWS and you.

1. When choosing **Review details** for runbooks, do the following:

   1. For **Runbook description**, review the content, which provides an overview of the actions the runbook can take to remediate the issue being investigated. Choose **View steps** to visualize the runbook's workflow and drill into the details of individual steps.

   1. For **Input parameters**, specify values for any parameters required by the runbook. These parameters vary from runbook to runbook.

   1. For **Execution preview**, carefully review the information. This information explains what the scope and impact would be if you choose to execute the runbook.

      The **Execution preview** content provides the following information:
      + How many accounts and Regions the runbook operation would occur in.
      + The types of actions that would be taken, and how many of each type.

        Action types include the following:
        + `Mutating`: A runbook step would make changes to the targets through actions that create, modify, or delete resources.
        + `Non-Mutating`: A runbook step would retrieve data about resources but not make changes to them. This category generally includes `Describe`, `List`, `Get`, and similar read-only API actions.
        + `Undetermined`: An undetermined step invokes executions performed by another orchestration service like AWS Lambda, AWS Step Functions, or Run Command, a capability of AWS Systems Manager. An undetermined step might also call a third-party API or run a Python or PowerShell script. Systems Manager Automation can't detect what the outcome would be of the orchestration processes or third-party API executions, and therefore can't evaluate them. The results of those steps would have to be manually reviewed to determine their impact.

        For information about supported actions and their impact types, see [Remediation impact types of runbook actions](https://docs.aws.amazon.com/systems-manager/latest/userguide/remediation-impact-type.html) in the *AWS Systems Manager User Guide*. 

   1. Review the preview information carefully before deciding whether to proceed.

      At this point, you can choose one of the following actions:
      + Stop and do not execute the runbook.
      + Change the input parameters before executing the runbook.
      + Execute the runbook with the options you have already selected.
**Important**  
Choosing to execute the runbook incurs charges. For information, see [AWS Systems Manager pricing](https://aws.amazon.com/systems-manager/pricing/#Automation). 

1. If you want to execute the runbook, choose **Execute**.

   If you already accepted the hypothesis, the execution runs.

   If you have not already accepted the hypothesis, a dialog box prompts you to accept it before the execution runs.

After you choose **Execute** for a runbook, that action is added to the **Feed** pane of the investigation. From the investigation, you can monitor new data in the metrics in the findings to see if the runbook actions are correcting the issue.

# Manage your current investigations
<a name="Investigations-Manage"></a>

You can view a list of your current investigations, end active investigations, re-open archived investigations, rename, and delete investigations. You can take these actions on individual investigations, or in bulk.

**To manage your current investigations**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the left navigation pane, choose **AI Operations**, **Investigations**.

1. (Optional) Filter the investigations displayed in the list by name or investigation state.

1. Select the checkboxes for the investigation or investigations that you want to take action on.

1. Choose **End investigation**, **Rename**, or **Delete**. 

   You will be prompted to confirm your action or to input a new investigation title.

# Restart an archived investigation
<a name="Investigations-Restart"></a>

You can restart archived investigations.

**To restart an archived investigation**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the left navigation pane, choose **AI Operations**, **Investigations**.

1. Choose the name of an archived investigation.

1. Choose **Restart investigation**.

1. (Optional)Update incident reports.

   Any incident reports generated from the original investigation remain available in the investigation history. You can access these reports from the investigation details page. If the restarted investigation discovers more facts, you can regenerate the incident report using the following steps:

   1. Choose **Incident report** to regenerate your incident report with new or updated facts. 

   1. From the **Incident report** page, review updated facts.

   1. Choose **Regenerate** to update your incident report. If the **Regenerate** button is disabled, no new facts are present.

   We recommend that you don't leave investigations open indefinitely, because alarm state transitions related to the investigation will keep being added to the investigation as long as it is open.

# Cross-account investigations
<a name="Investigations-cross-account"></a>

 Cross-account CloudWatch investigations enables you to investigate application issues that span multiple AWS accounts from a centralized monitoring account. This feature allows you to correlate telemetry data, metrics, and logs across up to 25 accounts, in addition to the monitoring account, to gain comprehensive visibility into distributed applications and troubleshoot complex multi-account scenarios. 

**Topics**
+ [

## Prerequisites
](#Investigations-cross-account-prereq)
+ [

## Setup your monitoring account for cross-account access
](#Investigations-cross-account-monitoring-account)
+ [

## Setup your source account(s) for cross-account access
](#Investigations-cross-account-source-account)
+ [

## Investigating multi-account issues
](#Investigations-cross-account-investigation)

## Prerequisites
<a name="Investigations-cross-account-prereq"></a>
+ Multi-account investigation requires you to already have cross-account observability set up in order to view cross-account telemetries. To complete the prerequisite, set up either [cross-account observability](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/centralize-monitoring-by-using-amazon-cloudwatch-observability-access-manager.html) or the [cross-account dashboard](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Cross-Account-Cross-Region.html).
+ Setup an investigation group. For cross-account observability, this should be in the monitoring account. You can also set them up in the source accounts and run single account investigations there.

## Setup your monitoring account for cross-account access
<a name="Investigations-cross-account-monitoring-account"></a>

**Setup your monitoring account for cross-account access**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the left navigation pane, choose **AI Operations**, **Configuration**.

1. Under **Configure Cross-account access**, select **Configure**.

1. Add the Account ID for up to 25 accounts under the **List source accounts** section.

1. Update your IAM role.

   1. Automatically
      + If you choose **Automatically update the assistant role (recommended) **, this creates a customer managed policy named `AIOpsAssistantCrossAccountPolicy-${guid}` that includes the `sts:AssumeRole` statements needed to assume the assistant role in the specified source accounts. Choosing the automatic update option defaults the IAM role name to `AIOps-CrossAccountInvestigationRole` in the source accounts .

------
#### [ JSON ]

****  

        ```
        {
            "Version":"2012-10-17",		 	 	 
            "Statement": {
                "Effect": "Allow",
                "Action": "sts:AssumeRole",
                "Resource": [
                    "arn:aws:iam::777777777777:role/AIOps-CrossAccountInvestigationRole",
                    "arn:aws:iam::555555555555:role/AIOps-CrossAccountInvestigationRole",
                    "arn:aws:iam::666666666666:role/AIOps-CrossAccountInvestigationRole"
                ]
            }
        }
        ```

------
      + If the monitoring account owner removes a source account from the cross-account configuration, the IAM policy will not update automatically. You must manually update the IAM role and policy to ensure it always has the minimum permissions possible.
      + You might reach the limit of managed policies per role if the permissions are not manually updated when a source account is removed. You must delete any unused managed policies attached to your investigation role.

   1. Manually
      + The following example shows the trust policy required for the assistant role: 

------
#### [ JSON ]

****  

        ```
        {
            "Version":"2012-10-17",		 	 	 
            "Statement": [
                {
                    "Sid": "AllowAIOpsAssumeRole",
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "aiops.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole",
                    "Condition": {
                        "StringEquals": {
                            "sts:ExternalId": "arn:aws:aiops:us-east-1:123456789012:investigation-group/AaBbcCDde1EfFG2g"
                        }
                    }
                }
            ]
        }
        ```

------

        You can use the AWS CLI to create the custom source account role and then attach the `AIOpsAssistantPolicy` to the role using the following commands, replacing the placeholder values with the appropriate values for your environment: 

        ```
        aws iam create-role
         --role-name custom-role-name
         --assume-role-policy-document 
            '{ 
               "Version": "2012-10-17",		 	 	  
               "Statement": [ 
                         { 
                              "Effect": "Allow",
                              "Principal": { "AWS": "investigation-group-role-arn"
                                  }, 
                              "Action": "sts:AssumeRole", 
                              "Condition": {
                                         "StringEquals": { 
                                                  "sts:ExternalId": "investigation-group-arn"
                                                    } } } ] }' 
        
        aws iam attach-role-policy 
         --role-name custom-role-name
         --policy-arn arn:aws:iam::aws:policy/AIOpsAssistantPolicy
        ```
      + To grant cross-account access, the permission policy of the assistant role in the monitoring account must contain the following. If you are configuring the monitoring account manually, the role name can be whatever you choose. It does not default to `AIOps-CrossAccountInvestigationRole`, make sure to specify the name of the assistant role for each of the source accounts. 

------
#### [ JSON ]

****  

        ```
        {
            "Version":"2012-10-17",		 	 	 
            "Statement": {
                "Effect": "Allow",
                "Action": "sts:AssumeRole",
                "Resource": [
                    "arn:aws:iam::777777777777:role/custom_source_account_role_name",
                    "arn:aws:iam::555555555555:role/custom_source_account_role_name",
                    "arn:aws:iam::666666666666:role/custom_source_account_role_name"
                ]
            }
        }
        ```

------
      + Use the AWS CLI to update the monitoring account investigation group with the custom source account role ARN using the following command, replacing the placeholder values with the appropriate values for your environment: 

        ```
        aws aiops update-investigation-group 
         --identifier investigation-group-id
         --cross-account-configurations sourceRoleArn=sourceRoleArn1  sourceRoleArn=sourceRoleArn2
        ```

        For more details on this command, see the [AWS CLI Command Reference](https://docs.aws.amazon.com/cli/latest/reference/aiops/update-investigation-group.html).

## Setup your source account(s) for cross-account access
<a name="Investigations-cross-account-source-account"></a>

1. Provision an IAM role with the name `AIOps-CrossAccountInvestigationRole` if you selected the **Automatically update the assistant role** option to set up the monitoring account. If you used the manual setup option, provision the IAM role with your customized source account role name.

   1. Attach the AWS managed policy `AIOpsAssistantPolicy` to the role in the IAM console.

   1. The trust policy of the role on the source account looks like this. `ExternalID` must be specified on the policy. Use the monitoring account investigation group ARN.

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                      "AWS": "arn:aws:iam::123456789012:role/investigation-role-name"
                  },
                  "Action": "sts:AssumeRole",
                  "Condition": {
                      "StringEquals": {
                          "sts:ExternalId": "investigation-group-arn"
                      }
                  }
              }
          ]
      }
      ```

------

1. This must be done in each of the source accounts.

1. If you set up the monitoring account role through the console, the role name of the source account defaults to `AIOps-CrossAccountInvestionRole`.

1. Confirm access by logging into the monitoring account, navigating to **Investigation Group**, then **Configuration**, and then choosing **Cross-account setup**.

   Make sure the source account shows up in the cross-account configuration, and the status is **Linked to monitoring account**.

## Investigating multi-account issues
<a name="Investigations-cross-account-investigation"></a>

After you set up CloudWatch cross-account observability dashboard, you can view and investigate from a cross-account telemetry in your monitoring account. You must add a cross-account telemetry from the source account in order to run an investigation into that source account. 

For detailed information about how to create an investigation, see [Investigate operational issues in your environment](Investigations-Investigate.md).

# Generate incident reports
<a name="Investigations-Incident-Reports"></a>

Incident reports help you more quickly and easily write a report about your incident investigation. You can use this report to provide details to management or to help your team learn from the incident and take actions to prevent future such occurrences. The structure of the report is based on industry standards for these types of reports and can be copied into other repositories for long-term retention.

When you use the AWS Management Console to create an *investigation group* resource in CloudWatch investigations, an IAM role is created for the group to give it access to resources during the investigation. Generating CloudWatch investigations incident reports requires additional permissions be granted to your investigation group. The new managed policy `AIOpsAssistantIncidentReportPolicy` provides the required permissions and is automatically added to investigation groups created using the AWS Management Console after October 10, 2025. For more information, see [AIOpsAssistantIncidentReportPolicy](managed-policies-cloudwatch.md#managed-policies-QInvestigations-AIOpsAssistantIncidentReportPolicy).

**Note**  
If you are using the CDK or SDK, you must explicitly add the investigation group role and specify the role policy or equivalent inline permissions on the role. For more details about permissions, see [Security in CloudWatch investigations](Investigations-Security.md) 

These reports capture investigation findings, root causes, timeline events, and recommended corrective actions in a structured format that can be easily shared with stakeholders and used for organizational learning.

Incident report generation is included at no additional charge for all CloudWatch investigations users and integrates seamlessly with your investigation workflow.

**How incident reports work**

1. Run an investigation on your incident.

1. Accept at least one hypothesis. Each hypothesis you accept is considered for the report. The hypothesis doesn't need to be 100% accurate.

1. Choose **Incident report**. During the investigation the AI parsed the data collected for your investigation and derived facts. Facts are atomic pieces of information about your incident that form the basis of generating the report. Fact extraction can take a few minutes.

1. When fact extraction is finished, you can review the facts available in the following areas:

   1. **Incident Overview** – High-level overview of the incident including its severity, duration, and operational hypothesis.

   1. **Impact Assessment** – Metrics and analysis related to the impact of the incident on customers, service function, and business operations.

   1. **Detection and Response** – Metrics and analysis related to how and when the incident was detected and how you responded to the incident.

   1. **Root Cause Analysis** – Detailed analysis of underlying causes based on investigation hypotheses.

   1. **Mitigation and Resolution** – Metrics and analysis related to mitigation steps and resolution measures, along with the time measurement for incident mitigation and resolution.

   1. **Learning and Next Steps** – A list of recommended actions for your team to consider, automatically generated from the investigation findings. These recommendations may include preventive measures against similar incidents, as well as suggested improvements to your monitoring and response processes.

1. After reviewing the facts, choose **Generate report** to create a comprehensive analysis of the incident. While the selected facts serve as key reference points, the report draws from all available information gathered during the investigation. This process can take a few minutes.

1. After generating the report, you can then either:
   + Use the report as is:
     + Copy it to edit in your external editor if needed
     + Save it for later reference
   + Enhance the report by adding more data:
     + Choose **Add facts** (recommended method) to input additional text-based content such as incident tickets or custom narratives. The AI will analyze this content to augment existing facts or infer new ones.
     + Edit facts directly (use sparingly) - Manually edited facts may create inconsistencies with the investigation timeline. This should be used only as a last resort when **Add facts** doesn't achieve the desired result.
   + Choose **Regenerate report** to produce a new report using the updated information.

**Topics**
+ [

# Understanding AI-derived facts in incident reports
](Investigations-IncidentReports-ai-facts.md)
+ [

# Incident report terminology
](Investigations-IncidentReports-terms.md)
+ [

# Generate a report from an investigation
](Investigations-IncidentReports-Generate.md)
+ [

# Using 5 Whys analysis in incident reports
](incident-report-5whys.md)

# Understanding AI-derived facts in incident reports
<a name="Investigations-IncidentReports-ai-facts"></a>

AI-derived facts form the foundation of CloudWatch investigations incident reports, representing information that the AI system considers objectively true or highly probable based on comprehensive analysis of your AWS environment. These facts emerge through a sophisticated process that combines machine learning pattern recognition with systematic verification methods, creating a robust framework for incident analysis that maintains the operational rigor required for production environments.

Understanding how AI-derived facts are developed helps you evaluate their reliability and make informed decisions during incident response. The process represents a hybrid approach where artificial intelligence augments human expertise rather than replacing it, ensuring that the insights generated are both comprehensive and trustworthy.

## The development process of AI-derived facts
<a name="Investigations-ai-facts-development"></a>

The journey from raw telemetry data to actionable AI-derived facts begins with pattern observation, where the CloudWatch investigations AI analyzes vast amounts of AWS telemetry using sophisticated machine learning algorithms. The AI examines your CloudWatch metrics, logs, and traces across multiple dimensions simultaneously, identifying recurring patterns and relationships that might not be immediately apparent to human operators. The analysis encompasses temporal patterns that reveal when incidents typically occur and their duration characteristics, service correlations that show how different AWS services interact during failure scenarios, metric anomalies that precede or accompany incidents, and log event sequences that indicate specific failure modes.

Consider, for example, how the AI might observe that in your environment, Amazon EC2 instance CPU utilization consistently spikes to above 90% approximately 15 minutes before application response times exceed acceptable thresholds. This temporal relationship, when observed across multiple incidents, becomes a significant pattern worthy of further investigation. The AI doesn't simply note the correlation; it measures the statistical significance of the relationship and considers various confounding factors that might influence the pattern.

From these observed patterns, the AI moves into hypothesis generation, formulating potential explanations for the relationships it has discovered. This process involves creating multiple competing hypotheses and ranking them by probability based on the strength of supporting evidence. When the AI observes that CPU spikes precede response time degradation, it might generate several hypotheses: resource exhaustion due to insufficient compute capacity, memory leaks causing increased CPU overhead, or inefficient algorithms triggered by specific input patterns. Each hypothesis receives a preliminary confidence level based on how well it explains the observed data and aligns with known AWS service behaviors.

The human verification and validation of these hypotheses ensures that these AI-generated insights meet operational standards before becoming facts in your incident reports. This process involves correlating AI-derived patterns with established AWS service behavior models, checking consistency with industry best practices for incident response, and validating against historical incident data from similar environments. The AI must demonstrate that its findings are reproducible across different analysis methods and time periods, meet statistical significance requirements for operational decision-making, align with empirical observations of AWS service behavior, and provide actionable insights for incident resolution or prevention.

Throughout this process, the AI faces several inherent challenges that you should understand when interpreting AI-derived facts. The distinction between correlation and causation remains a fundamental challenge; while the AI might identify strong correlations between network traffic spikes and incident occurrence, establishing direct causation requires additional investigation and domain expertise. Hidden variables that exist outside the scope of AWS telemetry, such as third-party service dependencies or external network provider issues, may influence incidents without being captured in the AI analysis. The quality of AI-derived facts depends entirely on the completeness and accuracy of the underlying CloudWatch data, making comprehensive monitoring coverage essential for reliable insights.

Novel incident patterns present another challenge, as those are not present in AI training data, and AIs often struggle to interpret unfamiliar failure modes. This limitation underscores the importance of human expertise in interpreting AI-derived facts and supplementing them with domain knowledge and contextual understanding.

## Applying AI-derived facts in incident response
<a name="Investigations-ai-facts-practical-application"></a>

AI excels at identifying patterns across large datasets that would be impractical for humans to analyze manually, providing insights that can significantly accelerate incident diagnosis and resolution. AI works best when combined with human expertise that can provide context, validate conclusions, and identify factors that may not be captured in telemetry data.

The most effective approach involves treating AI-derived facts as highly informed starting points for investigation rather than definitive conclusions. When the AI identifies a fact such as "Database connection pool exhaustion preceded the incident by 8 minutes," this provides a valuable lead that can be quickly verified through targeted analysis of database metrics and application logs. The fact gives you a specific timeframe and potential root cause to investigate, dramatically reducing the time needed to identify the issue compared to manually searching through all available telemetry.

Data quality plays a crucial role in the reliability of AI-derived facts. Comprehensive CloudWatch monitoring coverage provides the AI access to complete and accurate information for analysis. Gaps in monitoring can lead to incomplete or misleading facts, as the AI can only work with the data available to it. Organizations that use thorough observability practices that include detailed metrics collection, comprehensive logging, and distributed tracing are more likely to have accurate and actionable AI-derived facts in their incident reports.

# Incident report terminology
<a name="Investigations-IncidentReports-terms"></a>

The following terms are used in CloudWatch investigations incident reports:

AI-derived fact  
A piece of information or observation that the AI system considers to be objectively true or highly probable based on the available data, telemetry, logs, and historical patterns within AWS services. These facts are derived through algorithmic analysis and machine learning models, and while they are treated as reliable by the system, they should be subject to human verification, especially in critical decision-making contexts. AI-derived facts may include correlations between events, anomaly detections, or inferences about system behavior that might not be immediately apparent to human operators.

Corrective actions  
Specific, actionable steps recommended by CloudWatch investigations to address the root cause of an incident and prevent its recurrence, based on AWS best practices and the specific context of the affected resources.

Fact categories  
Structured groupings of incident-related information, such as impact metrics, detection details, and mitigation steps, used to organize data for report generation.

Impact assessment  
A quantitative and qualitative evaluation of an incident's effects on system performance, user experience, and business operations, derived from CloudWatch metrics and other AWS service data added to the investigation.

Incident report generation  
An automated process that creates comprehensive documentation of an operational incident, including its timeline, impact, root cause, and resolution steps, based on data collected during a CloudWatch investigations investigation.

Investigation Feed  
A chronological display of accepted observations, hypotheses, and user-added notes within a CloudWatch investigations investigation, serving as the primary record of the investigation's progress and findings.

Lessons learned  
Automatically generated insights and improvement opportunities identified through the incident investigation process, aimed at enhancing system reliability, operational efficiency, and incident response capabilities across the organization.

Report assessment  
An automated evaluation of the generated incident report, identifying potential data gaps or areas requiring additional information to improve report completeness and quality.

Root cause analysis  
A systematic process of identifying the fundamental reason for an operational issue, leveraging CloudWatch investigations AI-driven hypotheses and correlations across multiple AWS services.

Suggestions tab  
A feature in CloudWatch investigations that presents AI-generated observations and hypotheses about potential causes or related issues, based on analysis of system telemetry and logs.

Timeline events  
A chronological sequence of significant occurrences during an incident, automatically extracted from CloudWatch logs, metrics, and other AWS service data to provide a clear overview of incident progression.

# Generate a report from an investigation
<a name="Investigations-IncidentReports-Generate"></a>

You can generate incident reports from in-progress or completed investigations. Incident reports generated early in an investigation may not include key facts such as root causes and recommended actions. When the investigation is active you can edit the facts available to supplement the investigation with additional information. After the investigation is ended, you can't edit or add facts to the investigation.

**Prerequisites**

Before generating an incident confirm the following requirements are met:
+ Ensure the investigation group uses the required KMS key and has appropriate IAM policies attached to its role for decrypting data from AWS services. If your AWS resources are encrypted with customer-managed KMS keys, you must add IAM policy statements to the investigation group role to grant CloudWatch Investigations the permissions needed to decrypt and access this data.
+ Investigation group role has been granted the following permissions:
  + `aiops:GetInvestigation`
  + `aiops:ListInvestigationEvents`
  + `aiops:GetInvestigationEvent`
  + `aiops:PutFact`
  + `aiops:UpdateReport`
  + `aiops:CreateReport`
  + `aiops:GetReport`
  + `aiops:ListFacts`
  + `aiops:GetFact`
  + `aiops:GetFactVersions`
**Note**  
You can add these permissions as an inline policy to the investigation group role, or attach an additional permissions policy to investigation group role. For more information see, [Permissions for incident report generation](Investigations-Security.md#Investigations-Security-IAM-IRG).  
The new managed policy `AIOpsAssistantIncidentReportPolicy` provides the required permissions and is automatically added to investigation groups created after October 10, 2025. For more information, see [AIOpsAssistantIncidentReportPolicy](managed-policies-cloudwatch.md#managed-policies-QInvestigations-AIOpsAssistantIncidentReportPolicy).

**To generate an incident report**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the left navigation pane, choose **AI Operations**, **Investigations**.

1. Choose the name of an investigation.

1. On the investigation page, under **Feed** accept any additional relevant hypotheses and add any additional notes to the investigation.
**Note**  
Report generation requires an investigation with at least one accepted hypothesis.

1. On the top of the investigation page, choose **Incident report**. Wait while the relevant facts of the investigation are collected and synced.

1. On the **Incident Report** page review the facts being used to generate the report. The facts are available in the right pane. Navigate through the fact category tab using the left and right arrows, or expand the pane to see all of the categories.

   1. Choose **Edit** on a fact panel to manually add or edit the data in that category.

   1. Choose **View details** on a fact panel to see the supporting evidence and fact history gathered by the AI assistant. You can also choose **Edit** within the fact detail window.

   1. Choose **Add facts** if you want to provide additional context to the investigation, such as external events or extenuating circumstances.

1. Choose **Generate report**.

   CloudWatch investigations will analyze the investigation data and generate a structured report. This process might take some time.

1. Review the generated report in the preview pane. The report will include:
   + Automatically extracted timeline events
   + Root cause analysis based on accepted hypotheses
   + Impact assessment derived from investigation telemetry
   + Recommended corrective actions and lessons learned following AWS best practices

1. To retain a copy of the report in a different location, you can choose to copy the text of the report and paste it into your desired location.

1. Choose **Report assessment** to review a list of data gaps in the report. You can use this information to gather additional data for the report and then update the facts accordingly and regenerate the report.

# Using 5 Whys analysis in incident reports
<a name="incident-report-5whys"></a>

When generating incident reports, CloudWatch investigations can perform a 5 Whys root cause analysis to systematically identify the underlying causes of operational issues. This structured approach enhances your incident reports with deeper insights and actionable remediation steps.

This feature uses Amazon Q to provide a conversational chat. The user signed into the AWS Management Console must have the following permissions:

```
{ 
    "Sid" : "AmazonQAccess",
    "Effect" : "Allow",
    "Action" : [
       "q:StartConversation", 
       "q:SendMessage", 
       "q:GetConversation", 
       "q:ListConversations", 
       "q:UpdateConversation", 
       "q:DeleteConversation", 
       "q:PassRequest" 
     ],
    "Resource" : "*"
 }
```

You can add these permissions directly, or by attaching either the [AIOpsConsoleAdminPolicy](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AIOpsConsoleAdminPolicy.html) or [AIOpsOperatorAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AIOpsOperatorAccess.html) managed policy to the user or role. 

## What is 5 Whys analysis?
<a name="5whys-overview"></a>

The 5 Whys is a root cause analysis technique that asks "why" repeatedly to drill down from incident symptoms to fundamental causes. Each answer becomes the basis for the next question, creating a logical chain that reveals the true root cause rather than just surface-level symptoms.

During incident report generation, CloudWatch investigations uses this method to analyze investigation findings and provide structured root cause analysis that goes beyond immediate technical failures to identify process, configuration, or systemic issues.

## Benefits for incident reporting
<a name="why-5whys-incidents"></a>

Including 5 Whys analysis in incident reports provides several advantages:
+ **Comprehensive root cause identification** - Moves beyond immediate technical causes to identify underlying process or system issues
+ **Actionable remediation plans** - Provides specific, targeted actions to prevent recurrence rather than temporary fixes
+ **Organizational learning** - Documents the complete causal chain for future reference and team knowledge sharing
+ **Structured analysis** - Ensures systematic investigation rather than ad-hoc problem solving

## Example scenarios in incident reports
<a name="5whys-incident-examples"></a>

### Database connection failure incident
<a name="example-database-outage"></a>

**Initial incident:** E-commerce application experiencing widespread 500 errors

1. **Why 1:** Why are users getting 500 errors? The application cannot connect to the primary database.

1. **Why 2:** Why can't the application connect to the database? The database instance ran out of available connections.

1. **Why 3:** Why did the database run out of connections? A batch processing job opened many connections without properly closing them.

1. **Why 4:** Why didn't the batch job close connections properly? The job's error handling doesn't include connection cleanup in failure scenarios.

1. **Why 5:** Why wasn't proper error handling implemented? Code review process doesn't include specific checks for resource management patterns.

**Root cause:** Inadequate code review standards for resource management

**Recommended actions:** Update code review checklist, implement connection pooling monitoring, add automated resource leak detection

### Performance degradation incident
<a name="example-auto-scaling"></a>

**Initial incident:** API response times increased from 200ms to 5000ms during traffic spike

1. **Why 1:** Why did response times increase? CPU utilization reached 100% on all application instances.

1. **Why 2:** Why didn't auto scaling add more instances? Auto scaling was triggered but new instances failed health checks.

1. **Why 3:** Why did new instances fail health checks? The application startup process takes 8 minutes, longer than the health check timeout.

1. **Why 4:** Why does startup take so long? The application downloads large configuration files from S3 on every startup.

1. **Why 5:** Why wasn't this startup delay considered in auto scaling configuration? Performance testing was done with pre-warmed instances, not cold starts.

**Root cause:** Performance testing methodology doesn't reflect production auto scaling scenarios

**Recommended actions:** Include cold start testing, optimize application startup, adjust health check timeouts, implement configuration caching

### Complex incident with branch analysis
<a name="example-complex-branch"></a>

**Initial incident:** OpenSearch Serverless customers experienced 48.3% availability degradation for 11 hours

**Main analysis chain:**

1. **Why 1:** Why did customers experience service degradation? Service availability dropped to 48.3% due to incorrect ingester scaling.

1. **Why 2:** Why was ingester scaling incorrect? CortexOperator reduced ingesters from 223 to 174 due to AZ balance miscalculation.

1. **Why 3:** Why did CortexOperator miscalculate AZ balance? The code couldn't process new Kubernetes label formats after version 1.17 upgrade.

1. **Why 4 (Branch A - Technical):** Why didn't the code handle new label formats? The code expected 'failure-domain.beta.kubernetes.io/zone' labels but Kubernetes 1.17 changed to 'topology.kubernetes.io/zone'.

1. **Why 5 (Branch A):** Why wasn't backward compatibility implemented? The label format change wasn't documented in the upgrade notes reviewed during deployment planning.

**Branch B - Process Analysis:**

1. **Why 4 (Branch B - Process):** Why wasn't this caught in testing? Integration tests used pre-configured clusters with old label formats.

1. **Why 5 (Branch B):** Why didn't testing include label format validation? Test environment setup didn't mirror production Kubernetes version upgrade sequence.

**Root causes identified:**
+ Technical: Missing backward compatibility for Kubernetes label format changes
+ Process: Testing methodology doesn't validate version upgrade impacts

**Integrated remediation plan:** Implement label format detection logic, enhance upgrade testing procedures, add automated compatibility validation, and establish version change impact assessment process.

## Using the guided 5 Whys workflow
<a name="accessing-5whys"></a>

CloudWatch investigations provides a guided 5 Whys analysis workflow to help you address missing facts and strengthen your incident reports. This feature appears as a suggested workflow when the system identifies opportunities to enhance root cause analysis.

### Interactive analysis experience
<a name="interactive-analysis"></a>

The 5 Whys analysis in CloudWatch investigations uses an interactive, chat-based approach that guides you through the investigation process. This conversational method helps ensure comprehensive analysis while maintaining logical flow between questions.

**Key features of the interactive experience:**
+ **Fact-based initialization** - The system presents relevant facts from your investigation upfront, using them to pre-populate obvious answers and clearly indicating fact-based versus inference-based suggestions
+ **Guided probing** - For each "why" question, the system suggests answers based on available facts, requests specific additional context, and guides you to consider important aspects before proceeding
+ **Branch management** - When multiple contributing factors are identified, the system clearly presents branch options, explains relationships between branches, and helps prioritize parallel investigations
+ **Progressive validation** - For each response, the system reformulates answers for clarity, seeks confirmation, highlights key insights, and connects findings to broader context

This approach ensures that you capture all relevant information while maintaining focus on the most critical causal relationships.

**Accessing the guided workflow:**

1. During incident report generation, review the **Facts need attention** section in the right panel.

1. Look for the **Guided 5-Whys analysis** suggestion under **Suggested workflow**.

1. Choose **Guide me** to start the interactive 5 Whys process.

1. Follow the guided prompts to systematically work through each "why" question, building a complete causal chain from symptoms to root cause.

The guided workflow helps ensure you capture comprehensive root cause information by walking you through each step of the 5 Whys methodology. The analysis results are automatically incorporated into your incident report, providing structured documentation for post-incident reviews and organizational learning.

You can also request a 5 Whys analysis through the chat interface by asking questions such as "Perform a 5 Whys analysis for this incident" or "What is the root cause using 5 Whys methodology?"

## Handling complex incidents with multiple causes
<a name="branch-analysis"></a>

Some incidents involve multiple contributing factors that require parallel analysis paths. CloudWatch investigations supports branch analysis to ensure all significant causes are identified and addressed.

**When branch analysis is needed:**
+ Multiple independent failures occurred simultaneously
+ Different system components contributed to the same customer impact
+ Both technical and process failures played significant roles
+ Cascading failures created multiple causal chains

**Branch analysis process:**

1. **Branch identification** - The system identifies points where multiple causes converge or diverge

1. **Parallel investigation** - Each branch is analyzed using the complete 5 Whys methodology

1. **Connection mapping** - Relationships between branches are documented to show how they interact

1. **Integrated resolution** - Remediation plans address all identified root causes and their interactions

This comprehensive approach ensures that complex incidents receive thorough analysis and that all contributing factors are addressed in the final remediation plan.

## Best practices for effective 5 Whys analysis
<a name="5whys-best-practices"></a>

To maximize the effectiveness of 5 Whys analysis in your incident reports, follow these best practices derived from operational experience:

### Question formulation guidelines
<a name="question-formulation"></a>
+ **Start with customer impact** - Begin each analysis with the customer-facing problem to maintain focus on business impact
+ **Increase technical depth progressively** - Move from business impact to technical details as you progress through the questions
+ **Maintain logical continuity** - Ensure each answer naturally leads to the next question without logical gaps
+ **Include supporting evidence** - Reference specific metrics, logs, or timeline events to validate each answer

### Analysis validation
<a name="validation-criteria"></a>

Validate your 5 Whys analysis using these criteria:
+ **Logical flow** - Clear progression from symptoms to root cause with no missing steps
+ **Technical accuracy** - Correct terminology, accurate system behavior descriptions, and valid component interactions
+ **Completeness** - The analysis explains all observed symptoms and reaches a fundamental cause that, if addressed, would prevent recurrence
+ **Actionability** - The root cause identified leads to specific, implementable remediation actions

### Common pitfalls to avoid
<a name="common-pitfalls"></a>
+ **Stopping at symptoms** - Don't conclude the analysis at the first technical failure; continue until you reach systemic or process causes
+ **Blame-focused analysis** - Focus on system and process failures rather than individual actions
+ **Single-path thinking** - Consider multiple contributing factors and use branch analysis when appropriate
+ **Insufficient evidence** - Ensure each answer is supported by concrete data from your investigation

### Integration with incident report sections
<a name="5whys-integration"></a>

The 5 Whys analysis integrates with other sections of your incident report to provide comprehensive documentation:
+ **Timeline correlation** - Each "why" question can reference specific timeline events, providing temporal context for causal relationships
+ **Metrics validation** - Answers are supported by metrics and graphs that demonstrate the technical behaviors described
+ **Impact assessment alignment** - The first "why" directly connects to customer impact metrics documented in the impact assessment section
+ **Lessons learned foundation** - Root causes identified through 5 Whys analysis directly inform the lessons learned and corrective actions sections

This integration ensures consistency across your incident report and provides stakeholders with a complete, coherent narrative from initial symptoms through root cause to remediation plans.

# Integrations with other systems
<a name="Investigations-Integrations"></a>

**Topics**
+ [

## Integration with AWS Systems Manager Automation
](#Investigations-Integrations-SSM)
+ [

## Integration with third-party chat systems
](#Investigations-Integrations-Chat)

## Integration with AWS Systems Manager Automation
<a name="Investigations-Integrations-SSM"></a>

CloudWatch investigations is integrated with Automation, a capability of AWS Systems Manager. You don't need to configure integration, but you might need to update AWS Identity and Access Management (IAM) permissions so you can use Automation runbooks.

**What is AWS Systems Manager?**  
Systems Manager helps you centrally view, manage, and operate *managed nodes* at scale in AWS, on-premises, and multicloud environments. In Systems Manager, a managed node is any machine configured for use with Systems Manager. For information, see the [*AWS Systems Manager User Guide*.](https://docs.aws.amazon.com/systems-manager/latest/userguide/)

**What is Systems Manager Automation?**  
Automation performs common maintenance, deployment, troubleshooting, and remediation tasks through the use of *runbooks*. Each runbook defines a number of steps for performing tasks. Each step is associated with a particular *action*. The action determines the inputs, behavior, and outputs of the step. For descriptions of the nearly two dozen actions that are supported for runbooks, see the [Systems Manager Automation actions reference](https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-actions.html) in the *AWS Systems Manager User Guide*.

Automation provides over 400 AWS managed runbooks. For details about each runbook, including a step-by-step description of the actions performed when executed, see the [Systems Manager Automation runbook reference](https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-runbook-reference.html). Customers can also design their own runbooks to address specific scenarios in their environments. For information, see [Creating your own runbooks](https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-documents.html) in the *AWS Systems Manager User Guide*.

For information about working with runbooks in an investigation, see [Reviewing and executing suggested runbook remediations for CloudWatch investigations](suggested-investigation-actions.md).

## Integration with third-party chat systems
<a name="Investigations-Integrations-Chat"></a>

By integrating CloudWatch investigations with CloudWatch investigations in chat applications, you can have updates from investigations sent to third-party chat services, including Slack, and Microsoft Teams. The integration is facilitated by Amazon Simple Notification Service.

The following topics describe the required steps in the recommended order:

**Topics**
+ [

### Create and configure the Amazon SNS topic and access policy
](#Investigations-Integrations-Chat-policy)
+ [

### Configure CloudWatch investigations in your chat applications
](#Investigations-Integrations-Chatbot-configure)
+ [

### Add Amazon SNS topics to CloudWatch AI Operations
](#Investigations-Integrations-Chat-configure)

### Create and configure the Amazon SNS topic and access policy
<a name="Investigations-Integrations-Chat-policy"></a>

Create an Amazon SNS topic in the same region as your investigation. For more information, see [Creating an Amazon Simple Notification Service topic](https://docs.aws.amazon.com/sns/latest/dg/sns-create-topic.html).

To enable CloudWatch investigations to send notifications, you must add an the following access policy to the Amazon SNS topic

```
{
    "Sid": "AIOPS-CHAT-PUBLISH",
    "Effect": "Allow",
    "Principal": {
        "Service": "aiops.amazonaws.com"
    },
    "Action": "sns:Publish",
    "Resource": "SNS-TOPIC-ARN",
    "Condition": {
        "StringEquals": {
            "aws:SourceAccount": "account-Id"
        }
    }
}
```

### Configure CloudWatch investigations in your chat applications
<a name="Investigations-Integrations-Chatbot-configure"></a>

To configure CloudWatch investigations in chat applications for communication with a third-party chat service, use the following tutorials:
+ [Tutorial: Get started with Slack](https://docs.aws.amazon.com/chatbot/latest/adminguide/slack-setup.html).
+ [Tutorial: Get started with Microsoft Teams](https://docs.aws.amazon.com/chatbot/latest/adminguide/teams-setup.html).

Then, to support using AI assistant actions within chat channels you must provide the CloudWatch investigations in chat applications role with appropriate permissions. When you create a new IAM channel role for the channel, select the **Notifications** and **Amazon Q operations assistant permissions** policy templates.

Attach the **AIOpsOperatorAccess** managed IAM policy to the guardrail policies in CloudWatch investigations in chat applications. This grants permissions to CloudWatch investigations in chat applications to interact with CloudWatch investigations and perform required actions on your behalf.

In the channel configuration, you must also subscribe to the Amazon SNS topic that you created in the previous procedure.

### Add Amazon SNS topics to CloudWatch AI Operations
<a name="Investigations-Integrations-Chat-configure"></a>

You must use the CloudWatch console to configure CloudWatch investigations to integrate with Amazon SNS. You can do this while you create the investigation group in your account, or later.

If you have already created an investigation group and want to add chat integration, follow these steps.

**To add chat integration to an existing investigation group**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **AI Operations**, **Configuration**.

1. On the **Optional configuration** tab, in the **Chat integration** section, do the following:
   + If you have already integrated CloudWatch investigations in chat applications with a third-party chat system, you can choose **Select SNS topic** to choose the Amazon SNS topic to use to send updates to about investigations. This Amazon SNS topic will relay those updates to the chat client.
   + If you want to integrate CloudWatch investigations in chat applications with a third-party chat system, choose **Configure new chat client**. For more information about setting up this configuration, see [Getting started with CloudWatch investigations in chat applications](https://docs.aws.amazon.com/chatbot/latest/adminguide/getting-started.html).

When incident reports are generated, notifications can be sent to configured chat channels to alert team members that new documentation is available. These notifications include links to the generated reports and can be customized to include key findings or action items.

# Security in CloudWatch investigations
<a name="Investigations-Security"></a>

This section includes topics about how CloudWatch investigations integrate with AWS security and permissions features.

**Topics**
+ [

## Default CloudWatch investigations permissions, retention, and encryption
](#Ephemeral-Investigations-Security)
+ [

## User permissions for your CloudWatch investigations group
](#Investigations-Security-IAM)
+ [

## Additional permissions for Database Insights
](#Investigations-Security-RDS)
+ [

## How to control what data CloudWatch investigations has access to during investigations
](#Investigations-Security-Data)
+ [

## Encryption of investigation data
](#Investigations-KMS)
+ [

## Cross-Region inference
](#cross-region-inference)

## Default CloudWatch investigations permissions, retention, and encryption
<a name="Ephemeral-Investigations-Security"></a>

When you run investigations using default settings without additional configuration in your account the investigation uses the permissions available to your current console session and only accesses telemetry data using Read-only permissions. No investigation group IAM role configuration or permission policy is needed. However, that means the investigation's access to data is limited by the signed in user's permissions.

This investigation is only available to the same user that started the investigation. The investigation is available to view only for a 24 hour period, after which the investigation is deleted with no recovery option available. 

The investigation data is encrypted at rest with an AWS owned key. You can't view or manage AWS owned keys, and you can't use them for other purposes or audit their use. However, you don't have to take any action or change any settings to use these keys. For more information, see [AWS KMS keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html).

## User permissions for your CloudWatch investigations group
<a name="Investigations-Security-IAM"></a>

AWS has created three managed IAM policies that you can use for your users who will be working with your CloudWatch investigations group.
+ [AIOpsConsoleAdminPolicy](managed-policies-cloudwatch.md#managed-policies-QInvestigations-AIOpsConsoleAdminPolicy)– grants an administrator the ability to set up CloudWatch investigations in the account, access to CloudWatch investigations actions, the management of trusted identity propagation, and the management of integration with IAM Identity Center and organizational access.
+ [AIOpsOperatorAccess](managed-policies-cloudwatch.md#managed-policies-QInvestigations-AIOpsOperatorAccess)– grants a user access to investigation actions including starting an investigation. It also grants additional permissions that are necessary for accessing investigation events.
+ [AIOpsReadOnlyAccess](managed-policies-cloudwatch.md#managed-policies-QInvestigations-AIOpsReadOnlyAccess)– grants read-only permissions for CloudWatch investigations and other related services.

We recommend that you use three IAM principals, granting one of them the **AIOpsConsoleAdminPolicy** IAM policy, granting another the **AIOpsOperatorAccess** policy, and granting the third the **AIOpsReadOnlyAccess** policy. These principals could be either IAM roles (recommended) or IAM users. Then your users who work with CloudWatch investigations would sign on using one of these principals.

### Permissions for incident report generation
<a name="Investigations-Security-IAM-IRG"></a>

Incident report generation requires additional permissions to allow the AI to collect events, facts, and then create reports.

Users with **AIOpsConsoleAdminPolicy** can generate, edit, and copy incident reports. By default, your investigation group is assigned the **AIOpsAssistantPolicy **to give it access to resource. However, it does not have the permissions required to generate an investigation report. To give permissions to your investigation group to collate the investigation data into an incident report you must add a policy similar to the following example that includes additional permissions or add the additional actions as an inline policy to the group: 

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "IncidentReportOperations",
            "Effect": "Allow",
            "Action": [
                "aiops:GetInvestigation",
                "aiops:ListInvestigationEvents",
                "aiops:GetInvestigationEvent",

                "aiops:CreateReport",
                "aiops:UpdateReport", 
                "aiops:GetReport",

                "aiops:PutFact",
                "aiops:ListFacts",
                "aiops:GetFact",
                "aiops:GetFactVersions"
            ],
            "Resource": [
                "arn:aws:aiops:*:*:investigation-group/*"
            ]
        }
    ]
}
```

------

 The new managed policy `AIOpsAssistantIncidentReportPolicy` provides the required permissions and is automatically added to investigation groups created after October 10, 2025. For more information, see [AIOpsAssistantIncidentReportPolicy](managed-policies-cloudwatch.md#managed-policies-QInvestigations-AIOpsAssistantIncidentReportPolicy).

## Additional permissions for Database Insights
<a name="Investigations-Security-RDS"></a>

To use Database Insights capabilities during investigations, you must attach the `AmazonRDSPerformanceInsightsFullAccess` managed policy to the IAM role or user that you use to perform investigations. CloudWatch investigations requires these permissions to create and access performance analysis reports for your Amazon RDS database instances.

To attach this policy, use the IAM console to add the `AmazonRDSPerformanceInsightsFullAccess` managed policy to your investigation principal. For more information about this managed policy and its permissions, see [AmazonRDSPerformanceInsightsFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonRDSPerformanceInsightsFullAccess.html).

## How to control what data CloudWatch investigations has access to during investigations
<a name="Investigations-Security-Data"></a>

When you configure an investigation group in your account, you specify what permissions that CloudWatch investigations has to access your resources during investigations. You do this by assigning an IAM role to the investigation group.

To enable CloudWatch investigations to access resources and be able to make suggestions and hypotheses, the recommended method is to attach the **AIOpsAssistantPolicy** to the investigation group role. This grants the investigation group permissions to analyze your AWS resources during your investigations. For information about the complete contents of this policy, see [AIOpsAssistantPolicy](managed-policies-cloudwatch.md#managed-policies-QInvestigations-AIOpsAssistant).

You can also choose to attach the general AWS [https://docs.aws.amazon.com/aws-managed-policy/latest/reference/ReadOnlyAccess.html](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/ReadOnlyAccess.html) to the investigation group role, in addition to attaching **AIOpsAssistantPolicy**. The reason to do this is that AWS updates **ReadOnlyAccess** more frequently with permissions for new AWS services and actions that are released. The **AIOpsAssistantPolicy** will also be updated for new actions, but not as frequently.

If you want to scope down the permissions granted to CloudWatch investigations, you can attach a custom IAM policy to the investigation group IAM role instead of attaching the **AIOpsAssistantPolicy** policy. To do this, start your custom policy with the contents of [AIOpsAssistantPolicy](managed-policies-cloudwatch.md#managed-policies-QInvestigations-AIOpsAssistant) and then remove permissions that you don't want to grant to CloudWatch investigations. This will prevent CloudWatch investigations from making suggestions based on the AWS services or actions that you don't grant access to.

**Note**  
Anything that CloudWatch investigations can access can be added to the investigation and seen by your investigation operators. We recommend that you align CloudWatch investigations permissions with the permissions that your investigation group operators have.

### Allowing CloudWatch investigations to decrypt encrypted data during investigations
<a name="Investigations-Security-Decrypt"></a>

If you encrypt your data in any of the following services with a customer managed key in AWS KMS, and you want CloudWatch investigations to be able to decrypt the data from these services and include them in investigations, you'll need to attach one or more additional IAM policies to the investigation group IAM role. 
+ AWS Step Functions

The policy statement should include a context key for encryption context to help scope down the permissions. For example, the following policy would enable CloudWatch investigations to decrypt data for a Step Functions state machine.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
        {
            "Sid": "AIOPSKMSAccessForStepFunctions",
            "Effect": "Allow",
            "Principal": {
                "Service": "aiops.amazonaws.com"
            },
            "Action":
            [
                "kms:Decrypt"
            ],
            "Resource": "*",
            "Condition":
            {
                "StringEquals":
                {
                     "kms:ViaService": "states.*.amazonaws.com",
                     "kms:EncryptionContext:aws:states:stateMachineArn": "arn:aws:states:region:accountId:stateMachine:*"
                }
            }
        }
    ]
}
```

------

For more information about these types of policies and using these context keys, see [kms:ViaService](https://docs.aws.amazon.com/kms/latest/developerguide/conditions-kms.html#conditions-kms-via-service) and [kms:EncryptionContext:*context-key*](https://docs.aws.amazon.com/kms/latest/developerguide/conditions-kms.html#conditions-kms-encryption-context) in the AWS Key Management Service Developer Guide, and [aws:SourceArn](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourcearn) in the IAM User Guide.

## Encryption of investigation data
<a name="Investigations-KMS"></a>

For the encryption of your investigation data, AWS offers two options:
+ **AWS owned keys**– By default, CloudWatch investigations encrypts investigation data at rest with an AWS owned key. You can't view or manage AWS owned keys, and you can't use them for other purposes or audit their use. However, you don't have to take any action or change any settings to use these keys. For more information about AWS owned keys, see [AWS owned keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#aws-owned-cmk). 
+ **Customer managed keys**– These are keys that you create and manage yourself. You can choose to use a customer managed key instead of an AWS owned key for your investigation data. For more information about customer managed keys, see [Customer managed keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#customer-cmk). 

Incident reports generated from investigations use the same encryption settings as the parent investigation. This maintains a consistent security posture across your investigation data and documentation

**Note**  
CloudWatch investigations automatically enables encryption at rest using AWS owned keys at no charge. If you use a customer managed key, AWS KMS charges apply. For more information about pricing, see [AWS Key Management Service pricing](https://aws.amazon.com/kms/pricing/).

For more information about AWS KMS, see [AWS Key Management Service](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html).

### Using a customer managed key for your investigation group
<a name="Investigations-KMS-customerkey"></a>

You can associate an investigation group with a customer managed key, and then all investigations created in that group will use the customer managed key to encrypt your investigation data at rest.

CloudWatch investigations customer managed key usage has the following conditions:
+ CloudWatch investigations supports only symmetric encryption AWS KMS keys with the default key spec, `SYMMETRIC_DEFAULT`, and that have usage defined as `ENCRYPT_DECRYPT`.
+ For a user to create or update an investigation group with a customer managed key, that user must have the `kms:DescribeKey`, `kms:GenerateDataKey`, and `kms:Decrypt` permissions.
+ For a user to create or update an investigation in an investigation group that uses a customer managed key, that user must have the `kms:GenerateDataKey` and `kms:Decrypt` permissions.
+ For a user to view investigation data in an investigation group that uses a customer managed key, that user must have the `kms:Decrypt` permission.

### Setting up investigations to use a AWS KMS customer managed key
<a name="Investigations-KMS-Setup"></a>

First, if you don't already have a symmetric key that you want to use, create a new key with the following command.

```
aws kms create-key
```

The command output includes the key ID and the Amazon Resource Name (ARN) of the key. You will need those in later steps in this section. The following is an example of this output.

```
{
"KeyMetadata": {
"Origin": "AWS_KMS",
        "KeyId": "1234abcd-12ab-34cd-56ef-1234567890ab",
        "Description": "",
        "KeyManager": "CUSTOMER",
        "Enabled": true,
        "CustomerMasterKeySpec": "SYMMETRIC_DEFAULT",
        "KeyUsage": "ENCRYPT_DECRYPT",
        "KeyState": "Enabled",
        "CreationDate": 1478910250.94,
        "Arn": "arn:aws:kms:us-west-2:111122223333:key/6f815f63-e628-448c-8251-e4EXAMPLE",
        "AWSAccountId": "111122223333",
        "EncryptionAlgorithms": [
            "SYMMETRIC_DEFAULT"
        ]
    }
}
```

**Set permissions on the key**

Next, set permissions on the key. By default, all AWS KMS keys are private. Only the resource owner can use it to encrypt and decrypt data. However, the resource owner can grant permissions to access the key to other users and resources. With this step, you give the AI Operations service principal permission to use the key. This service principal must be in the same AWS Region where the KMS key is stored.

As a best practice, we recommend that you restrict the use of the KMS key to only those AWS accounts or resources that you specify.

The first step to set the permissions is to save the default policy for your key as `policy.json`. Use the following command to do so. Replace *key-id* with the ID of your key.

```
aws kms get-key-policy --key-id key-id --policy-name default --output text > ./policy.json
```

Open the `policy.json` file in a text editor and add the following policy sections into that policy. Separate the existing statement from the new sections with a comma. These new sections use `Condition` sections to enhance the security of the AWS KMS key. For more information, see [AWS KMS keys and encryption context](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/encrypt-log-data-kms.html#encrypt-log-data-kms-policy).

This policy provides permissions for service principals for the following reasons:
+ The `aiops` service needs `GenerateDataKey` permissions to get the data key and use that data key to encrypt your data while it is stored in rest. The `Decrypt` permission is needed to decrypt your data while reading from the data store. The decryption happens when you read the data using `aiops` APIs or when you update the investigation or investigation event. The update operation fetches the existing data after decrypting it, updates the data, and stores the updated data in the data store after encrypting
+ The CloudWatch alarms service can create investigations or investigation events. These create operations verify that the caller has access to the AWS KMS key defined for the investigation group. The policy statement gives the `GenerateDataKey` and `Decrypt` permissions to the CloudWatch alarms service to create investigations on behalf of you.

**Note**  
The following policy assumes that you follow the recommendation of using three IAM principals, and granting one of them the **AIOpsConsoleAdminPolicy** IAM policy, granting another the **AIOpsOperatorAccess** policy, and granting the third the **AIOpsReadOnlyAccess** policy. These principals could be either IAM roles (recommended) or IAM users. Then your users who work with CloudWatch investigations would sign on with one of these principals.  
For the following policy, you'll need the ARNs of those three principals.

```
{
    "Sid": "Enable AI Operations Admin for the DescribeKey permissions",
    "Effect": "Allow",
    "Principal": {
        "AWS": "arn:aws:iam::{account-id}:role/{AIOpsConsoleAdmin}"
    },
    "Action": [
        "kms:DescribeKey"
    ],
    "Resource": "*",
    "Condition": {
        "StringEquals": {
            "kms:ViaService": "aiops.{region}.amazonaws.com"
        }
    }
},
{
   "Sid": "Enable AI Operations Admin and Operator for the Decrypt and GenerateDataKey permissions", 
   "Effect": "Allow",
    "Principal": {
        "AWS": [
            "arn:aws:iam::{account-id}:role/{AIOpsConsoleAdmin}",
            "arn:aws:iam::{account-id}:role/{AIOpsOperator}"
         ]
    },
    "Action": [
        "kms:Decrypt",
        "kms:GenerateDataKey"
    ],
    "Resource": "*",
    "Condition": {
       "StringEquals": {
            "kms:ViaService": "aiops.{region}.amazonaws.com"
        },
        "ArnLike": {
            "kms:EncryptionContext:aws:aiops:investigation-group-arn": "arn:aws:aiops:{region}:{account-id}:investigation-group/*"
        }
    }
 },
 {
   "Sid": "Enable AI Operations ReadOnly for the Decrypt permission",
   "Effect": "Allow",
    "Principal": {
        "AWS": "arn:aws:iam::{account-id}:role/{AIOpsReadOnly}"
    },
    "Action": [
        "kms:Decrypt"
    ],
    "Resource": "*",
    "Condition": {
       "StringEquals": {
            "kms:ViaService": "aiops.{region}.amazonaws.com"
        },
        "ArnLike": {
            "kms:EncryptionContext:aws:aiops:investigation-group-arn": "arn:aws:aiops:{region}:{account-id}:investigation-group/*"
        }
    }
 },
 {
   "Sid": "Enable the AI Operations service to have the DescribeKey permission",
   "Effect": "Allow",
    "Principal": {
        "Service": "aiops.amazonaws.com"
    },
    "Action": [
        "kms:DescribeKey"
    ],
    "Resource": "*",
    "Condition": {
       "StringEquals": {
            "aws:SourceAccount": "{account-id}"
        },
        "StringLike": {
            "aws:SourceArn": "arn:aws:aiops:{region}:{account-id}:investigation-group/*"
        }
    }
 },
 {
   "Sid": "Enable the AI Operations service to have the Decrypt and GenerateDataKey permissions",
   "Effect": "Allow",
    "Principal": {
        "Service": "aiops.amazonaws.com"
    },
    "Action": [
        "kms:Decrypt",
        "kms:GenerateDataKey"
    ],
    "Resource": "*",
    "Condition": {
       "StringEquals": {
            "aws:SourceAccount": "{account-id}"
        },
        "StringLike": {
            "aws:SourceArn": "arn:aws:aiops:{region}:{account-id}:investigation-group/*"
        },
        "ArnLike": {
            "kms:EncryptionContext:aws:aiops:investigation-group-arn": "arn:aws:aiops:{region}:{account-id}:investigation-group/*"
        }
    }
 },
 {
    "Sid": "Enable CloudWatch to have the Decrypt and GenerateDataKey permissions",
    "Effect": "Allow",
    "Principal": {
        "Service": "aiops.alarms.cloudwatch.amazonaws.com"
    },
    "Action": [
        "kms:GenerateDataKey",
        "kms:Decrypt"
    ],
    "Resource": "*",
    "Condition": {
        "ArnLike": {
            "kms:EncryptionContext:aws:aiops:investigation-group-arn": "arn:aws:aiops:{region}:{account-id}:investigation-group/*"
        },
        "StringEquals": {
            "aws:SourceAccount": "{account-id}",
            "kms:ViaService": "aiops.{region}.amazonaws.com"
        }, 
        "StringLike": { 
            "aws:SourceArn": "arn:aws:cloudwatch:{region}:{account-id}:alarm:*"
        }
    }
  }
```

After you've updated the policy, assign it to the key by entering the following command.

```
aws kms put-key-policy --key-id key-id --policy-name default --policy file://policy.json
```

**Associate the key with the investigation group**

When you use the CloudWatch console to create an investigation group, you can choose to associate the AWS KMS key with the investigation group. For more information, see [Set up an investigation group](Investigations-GetStarted-Group.md).

You can also associate a customer managed key with an existing investigation group.

### Changing your encryption configuration
<a name="Investigations-KMS-Changes"></a>

You can update an investigation group to change between using a customer managed key or a service owned key. You can also change from using one customer managed key to using another. When you make such a change, the change applies to new investigations created after the change. Previous investigations are still associated with the old encryption configuration. Current ongoing investigations also continue using the original key for new data.

As long as a previously-used key is active and Amazon Q has access to it for investigations, you can retrieve the older investigations encrypted with that method, as well as data in current investigations that was encrypted with the previous key. If you delete a previously-used key or revoke access to it, the investigation data encrypted with that key can't be retrieved.

## Cross-Region inference
<a name="cross-region-inference"></a>

CloudWatch investigations uses *cross-Region inference* to distribute traffic across different AWS Region. Although the data remains stored only in the primary Region, when using cross-Region inference, your investigation data might move outside of your primary Region. All data will be transmitted encrypted across Amazon’s secure network. 

For details about where cross-Region inference distribution occurs for each Region, see the following table.


| Supported CloudWatch investigations geography | Investigation Region | Possible inference Regions | 
| --- | --- | --- | 
| United States (US) | US East (N. Virginia) | US East (N. Virginia), US East (Ohio), US West (Oregon) | 
|  | US East (Ohio) | US East (N. Virginia), US East (Ohio), US West (Oregon) | 
|  | US West (Oregon) | US East (N. Virginia), US East (Ohio), US West (Oregon) | 
| Europe (EU) | Europe (Frankfurt) | Europe (Frankfurt), Europe (Ireland), Europe (Paris), Europe (Stockholm) | 
|  | Europe (Ireland) | Europe (Frankfurt), Europe (Ireland), Europe (Paris), Europe (Stockholm) | 
|  | Europe (Spain) | Europe (Frankfurt), Europe (Ireland), Europe (Paris), Europe (Stockholm) | 
|  | Europe (Stockholm) | Europe (Frankfurt), Europe (Ireland), Europe (Paris), Europe (Stockholm) | 
| Asia-Pacific (AP) | Asia Pacific (Hong Kong) | US East (N. Virginia), US East (Ohio), US West (Oregon) | 
|  | Asia Pacific (Mumbai) | US East (N. Virginia), US East (Ohio), US West (Oregon) | 
|  | Asia Pacific (Singapore) | US East (N. Virginia), US East (Ohio), US West (Oregon) | 
|  | Asia Pacific (Sydney) | US East (N. Virginia), US East (Ohio), US West (Oregon) | 
|  | Asia Pacific (Tokyo) | US East (N. Virginia), US East (Ohio), US West (Oregon) | 
|  | Asia Pacific (Malaysia) | US East (N. Virginia), US East (Ohio), US West (Oregon) | 
|  | Asia Pacific (Thailand) | US East (N. Virginia), US East (Ohio), US West (Oregon) | 

# CloudWatch investigations data retention
<a name="Investigations-Retention"></a>

The retention period that you set for an investigation group determines how long that investigation data is kept. Valid values are seven days to 90 days.

After you first create an investigation, if you don't end it manually, it moves to a CLOSED state automatically after seven days. Then, the retention period determines how long the data is kept after the investigation moves to the CLOSED state. The data that is kept during the retention period includes the data in the investigation, accepted and discarded findings, and AI assistant audit log messages.

When this retention period expires, the investigation data is deleted.

If you manually end an investigation, that also moves the investigation to the CLOSED state and the retention period time begins to be in effect. 

**Note**  
Investigations conducted without configuring your CloudWatch investigations settings are linked to individual user sessions and are deleted after 24 hours, with no recovery option available.

## Incident report retention
<a name="Investigations-Retention-IncidentReports"></a>

Incident reports generated from investigations follow the same retention policy as their parent investigations. 

We recommend copying important incident reports to external systems if you need to retain them beyond the investigation retention period.

For more information, see [Generate incident reports](Investigations-Incident-Reports.md).

# Troubleshooting
<a name="Investigations-Troubleshooting"></a>

**Topics**
+ [

## CloudWatch investigations cannot assume the necessary IAM roles or permissions. Please verify required roles and permissions are correctly configured
](#Investigations-Troubleshooting-Permissions)
+ [

## Unable to identify event source. Verify that the resource exists in your application topology and the resource type is supported.
](#Investigations-Troubleshooting-eventsource)
+ [

## Analysis complete. Submit additional findings to receive updated suggestions
](#Investigations-Troubleshooting-complete)
+ [

## Source account status shows "Pending link to monitoring account"
](#Investigations-Troubleshooting-cross-account)
+ [

## Incident report generation issues
](#Investigations-Troubleshooting-IncidentReports)

## CloudWatch investigations cannot assume the necessary IAM roles or permissions. Please verify required roles and permissions are correctly configured
<a name="Investigations-Troubleshooting-Permissions"></a>

CloudWatch investigations use an IAM role to be able to access information in your topology. This IAM role must be configured with adequate permissions. For more information about the necessary permissions, see [How to control what data CloudWatch investigations has access to during investigations](Investigations-Security.md#Investigations-Security-Data).

## Unable to identify event source. Verify that the resource exists in your application topology and the resource type is supported.
<a name="Investigations-Troubleshooting-eventsource"></a>

There are several AWS services and features that we recommend you to enable to provide additional valuable information to CloudWatch investigations. These services and features can help CloudWatch investigations identify event sources. For more information, see [(Recommended) Best practices to enhance investigations](Investigations-RecommendedServices.md).

## Analysis complete. Submit additional findings to receive updated suggestions
<a name="Investigations-Troubleshooting-complete"></a>

When you see this message, CloudWatch investigations has finished analyzing your topology and telemetry based on the findings that it has found so far. If you think that the root cause hasn't been found, you can manually add more telemetry to the investigation, and this might cause CloudWatch investigations to scan your system again based on the new information.

To add new telemetry, navigate to that service's console and add the telemetry to the investigation. For example, to add a Lambda metric to the investigation, you can do the following:

1. Open the Lambda console.

1. In the **Monitor** section, find the metric.

1. Open the vertical ellipsis context menu ![\[An example of a CloudWatch overview home page, showing alarms and their current state, and examples of other metrics graph widgets that might appear on the overview home page.\]](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/images/vmore.png) for the metric, choose **Investigate**, **Add to investigation** Then, in the **Investigate** pane, select the name of the investigation.

## Source account status shows "Pending link to monitoring account"
<a name="Investigations-Troubleshooting-cross-account"></a>

Check that both your monitoring account and source account are set up correctly.

1. Check that the source account ID and the source account role name are correct.

1. The monitoring account role needs to have `sts:AssumeRole` permission to assume the source account role.

1. The source account role needs to have a trust policy for the monitoring account role to assume.

1. The trust policy in the source account role must be properly scoped to have the Investigation Group arn in the `sts:ExternalId` context key:

   ```
               "Condition": {
                   "StringEquals": {
                       "sts:ExternalId": "investigation-group-arn"
                   }
               }
   ```

## Incident report generation issues
<a name="Investigations-Troubleshooting-IncidentReports"></a>

This section describes common issues you might encounter when generating incident reports and how to resolve them.

### Report generation fails or produces incomplete content
<a name="Investigations-Troubleshooting-IncidentReports-Generation"></a>

If incident report generation fails, verify the following:
+ The investigation has at least one accepted hypothesis in the **Feed**
+ Your user and your investigation group have the necessary permissions, see [User permissions for your CloudWatch investigations group](Investigations-Security.md#Investigations-Security-IAM).