Investigate operational issues in your environment - Amazon CloudWatch

Investigate operational issues in your environment

Note

The Amazon Q Developer operational investigations feature is in preview release and is subject to change. It is currently available only in the US East (N. Virginia) Region.

Create an investigation

Create an investigation from an AWS console page

You can start an investigation from several AWS consoles, including (but not limited to) CloudWatch alarm pages, CloudWatch metric pages, and Lambda monitoring pages.

To start an investigation from an AWS console page
  1. In the graph of the metric or alarm that you want to investigate, select the time range that you want the investigation to include.

  2. If the top of the page has an Investigate button, choose it and then choose Start new investigation.

    Otherwise, choose the vertical ellipsis menu icon An example of a CloudWatch overview home page, showing alarms and their current state, and examples of other metrics graph widgets that might appear on the overview home page. for the metric, and choose Investigate, Start a new investigation.

  3. In the Investigation pane, enter a name for the investigation in New investigation title, and optionally enter notes about the selected metric or alarm. Then choose Start investigation.

    The investigation starts. Amazon Q Developer scans your telemetry data to find data that might be associated with this situation.

  4. To move the investigation data to the larger pane, choose Open in full page.

  5. For detailed instructions about steps that you can take while continuing the investigation, see View and continue an open investigation.

Create an investigation from Amazon Q chat

You can ask questions about issues in your deployment in Amazon Q Developer chat. The question could be something like "Why is my Lambda function slow today?"

When you do so, Amazon Q Developer might ask follow up questions and run a health check regarding the issue. After the health check, the chat will prompt you about whether you want to start an investigation.

For more information and more sample questions, see Chatting with Amazon Q Developer about AWS..

For detailed instructions about steps that you can take while continuing the investigation after it has been started, see View and continue an open investigation.

Create an investigation from a CloudWatch alarm action

When you create a CloudWatch alarm, you can specify for it to automatically start an investigation when it goes into ALARM state. You can do this for both metric alarms and composite alarms. For more information about creating alarms, see Alarming on metrics and Create a composite alarm.

View and continue an open investigation

Use the steps in this section to view and continue and existing investigation

To view and continue an investigation
  1. If you aren't already on the page for the investigation, do the following:

    1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

    2. In the left navigation pane, choose AI Operations, Investigations.

    3. Choose the name of the investigation.

  2. The Feed section displays the items that have been added to the investigation findings, including the metric or alarm that was originally selected to start the investigation with.

    The pane on the right includes tabs. Choose the Suggestions tab.

  3. The Suggestions tab displays observations of other telemetry that Amazon Q Developer has found that might be related to the investigation. It might also include hypotheses, which are possible reasons or root causes that Amazon Q Developer has found for the situation.

    Both observations and hypotheses are written in natural language by Amazon Q Developer.

    You have several options:

    • For each suggestion, you can choose Accept or Discard.

      When you choose Accept, the suggestion is added to the Feed section, and Amazon Q Developer uses this information to direct further scanning and suggestions.

      If you choose Discard, the suggestion is moved to the Discarded tab.

    • For each observation-type suggestion, you can choose to expand the graph in the Suggestions tab, or open it in the CloudWatch console to see more details about it.

    • Some of the observations might be results of CloudWatch Logs Insights queries that Amazon Q Developer ran as part of the investigation. When an observation is a CloudWatch Logs Insights query result, the query itself is displayed as part of the observation. You can edit the query and re-run it. To do so, choose the vertical ellipsis menu icon An example of a CloudWatch overview home page, showing alarms and their current state, and examples of other metrics graph widgets that might appear on the overview home page. by the results, and then choose Open in Logs Insights. For more information, see Analyzing log data with CloudWatch Logs Insights.

    • If you know of telemetry in an AWS service that might apply to this investigation, you can go to that service's console and add the telemetry to the investigation. For example, to add a Lambda metric to the investigation, you can do the following:

      1. Open the Lambda console.

      2. In the Monitor section, find the metric.

      3. Open the vertical ellipsis context menu An example of a CloudWatch overview home page, showing alarms and their current state, and examples of other metrics graph widgets that might appear on the overview home page. for the metric, choose Investigate, Add to investigation Then, in the Investigate pane, select the name of the investigation.

    • When you view a hypothesis in the Suggestions tab, you can choose Show reasoning to display the data that Amazon Q Developer used to generate the hypothesis.

    • You can choose the Discarded tab and view the suggestions that have been previously discarded. To add one of them to the findings, choose Restore to findings.

    • To add notes to the findings, choose New note in the Feed pane. Then enter your notes and choose Add.

  4. When you add a hypothesis to the Feed area, it might display Show suggested actions. If so, choosing this displays possible actions that you can take, assuming that hypothesis is correct about the issue. Possible actions include the following:

    • Documentation suggestions are links to AWS documentation that can help you understand the issue that you are working on, and how to solve it. To view suggested documentation, choose its Review link

    • Runbook suggestions are suggestions that leverage the pre-defined runbooks in Systems Manager Automation. Each runbook defines a number of steps for performing a task on an AWS resource.

      Important

      There is a charge for executing an Automation runbook. However, Amazon Q Developer operational investigations provides you with a preview of actions that a suggested runbook takes, giving you an opportunity to better evaluate whether to execute the runbook. For information about Automation pricing, see AWS Systems Manager pricing for Automation.

      For information about continuing with a runbook action, see Reviewing and executing suggested runbook remediations for Amazon Q Developer operational investigations before continuing with the following step in this procedure.

  5. To end an investigation, choose End investigation and then optionally add final notes. Then choose Save.

    The investigation status changes to Archived. You can restart archived investigations by opening the investigation page and choosing Restart investigation.

    We recommend that you don't leave investigations open indefinitely, because alarm state transitions related to the investigation will keep being added to the investigation as long as it is open.

Note

At some points, you might see Completed the analysis. Finished with the investigation. displayed above the Feed area. If you then add more telemetry to the findings, this message changes and Amazon Q Developer begins scanning your telemetry again, based on the new data that you added to the findings.

Reviewing and executing suggested runbook remediations for Amazon Q Developer operational investigations

When you add a hypothesis to the Feed area of an active investigation, Amazon Q Developer operational investigations might display Show suggested actions. One suggested action might be to view documentation with information to help you remediate a problem manually.

Another suggestion might be to use an Automation runbook to attempt to automatically resolve the issue. Automation is a capability in Systems Manager, another AWS service. Automation runbooks define a series of steps, or actions, to be run on the resources that you select. Each runbook is designed to address a specific issue. Runbooks can address a variety of operational needs: Creating, repairing, reconfiguring, installing, troubleshooting, remediating, duplicating, and more. For more information about Automation, see Integration with AWS Systems Manager Automation.

Before you begin

Before working with Automation runbooks in an investigation, be aware of the following important considerations:

  • Choosing to execute a runbook incurs charges. For information, see AWS Systems Manager pricing.

  • Root causes and runbook suggestions are powered by automated reasoning and generative artificial intelligence services.

    Important

    You are responsible for actions that result from executing runbook steps and the choice of parameter values entered during runbook execution. You might need to edit the suggested runbook to make sure the runbook performs as expected. For more information, see AWS responsible AI policy.

  • Depending on the runbook, you might need to enter values for the runbook's Input parameters before the execution can run.

  • The runbook executes using the IAM permissions assigned to the operator. If necessary, sign in with different IAM permissions to execute the runbook. In addition to permissions for the actions being taken, you'll need additional Systems Manager permissions to execute runbook steps. For more information, see Setting up Automation in the AWS Systems Manager User Guide.

To review and execute suggested runbook actions for Amazon Q Developer operational investigations
  1. To view information about a suggested runbook, choose Review for information about how to execute the runbook steps.

    On the investigation details page, choose Suggestions.

  2. In the Suggestions pane, review the list of hypotheses based on the system's analysis of the issue under investigation.

    For each hypothesis, you can choose from the following options:

    • Show reasoning – View more information about why the system has generated the hypothesis.

    • View actions – View the suggested actions for the issue. Not all hypotheses will include suggested actions.

    • Accept – Accept the hypothesis and add it to the investigation's Feed section.

      Note

      Accepting the hypothesis doesn't automatically run the associated runbook solution. You can view suggested runbooks before accepting a hypothesis, but you must accept the hypothesisto execute a runbook.

    • Discard – Reject the hypothesis and don't engage with it any further.

  3. After you choose View action, in the Suggested actions pane, review the list of suggested actions you can take to address the issue. Suggested actions can include one or more of the following:

    • AWS knowledge articles – Provides information about steps you can take to manually address the issue, plus a link to more information.

    • AWS documentation – Provides links to user documentation topics related to the issue.

    • AWS-owned runbooks – Lists one or more Automation runbooks that are managed by AWS that you can run to attempt issue resolution.

    • Runbooks owned by you – Lists one or more custom Automation runbooks created by you or someone else in your account or organization, which you can run to attempt issue resolution.

      Note

      The system automatically generates this list of runbooks by evaluating keywords in your custom runbooks and then comparing them to terms related to the issue being investigated.

      More keyword matches mean a particular custom runbook appears higher in the Runbooks owned by you list.

  4. After reviewing the hypothesis, you can examine a specific suggested action further and read related documentation by choosing Learn more. You can also choose Review details to inspect suggested runbooks owned by AWS and you.

  5. When choosing Review details for runbooks, do the following:

    1. For Runbook description, review the content, which provides an overview of the actions the runbook can take to remediate the issue being investigated. Choose View steps to visualize the runbook's workflow and drill into the details of individual steps.

    2. For Input parameters, specify values for any parameters required by the runbook. These parameters vary from runbook to runbook.

    3. For Execution preview, carefully review the information. This information explains what the scope and impact would be if you choose to execute the runbook.

      The Execution preview content provides the following information:

      • How many accounts and Regions the runbook operation would occur in.

      • The types of actions that would be taken, and how many of each type.

        Action types include the following:

        • Mutating: A runbook step would make changes to the targets through actions that create, modify, or delete resources.

        • Non-Mutating: A runbook step would retrieve data about resources but not make changes to them. This category generally includes Describe, List, Get, and similar read-only API actions.

        • Undetermined: An undetermined step invokes executions performed by another orchestration service like AWS Lambda, AWS Step Functions, or Run Command, a capability of AWS Systems Manager. An undetermined step might also call a third-party API or run a Python or PowerShell script. Systems Manager Automation can't detect what the outcome would be of the orchestration processes or third-party API executions, and therefore can't evaluate them. The results of those steps would have to be manually reviewed to determine their impact.

        For information about supported actions and their impact types, see Remediation impact types of runbook actions in the AWS Systems Manager User Guide.

    4. Review the preview information carefully before deciding whether to proceed.

      At this point, you can choose one of the following actions:

      • Stop and do not execute the runbook.

      • Change the input parameters before executing the runbook.

      • Execute the runbook with the options you have already selected.

    Important

    Choosing to execute the runbook incurs charges. For information, see AWS Systems Manager pricing.

  6. If you want to execute the runbook, choose Execute.

    If you already accepted the hypothesis, the execution runs.

    If you have not already accepted the hypothesis, a dialog box prompts you to accept it before the execution runs.

After you choose Execute for a runbook, that action is added to the Feed pane of the investigation. From the investigation, you can monitor new data in the metrics in the findings to see if the runbook actions are correcting the issue.