Amazon Lookout for Metrics is no longer available to new customers. Existing Amazon Lookout for Metrics customers will be able to use the service until September 12, 2025, when we will end support for Amazon Lookout for Metrics. To help transition off of Amazon Lookout for Metrics, please read Transitioning off Amazon Lookout for Metrics
Create a datasource in Amazon S3
To get started with Lookout for Metrics, create a dataset that you can look for anomalies in. You can use data that you already have, or generate data with a sample application.
A Lookout for Metrics detector can read data in an Amazon S3 bucket in CSV or JSON lines format. Each record must have a numerical field to monitor (a measure) and a field that represents a date or timestamp. Records can also have categorical fields (dimensions), additional measures, and fields that are not monitored.
For example, the following example is a CSV file that has a measure field named Calls
, a timestamp
field named Date
, and text fields that could be used as dimensions.
Example CSV file
Date,EventSource,EventName,ReadOnly,AccessKeyId,Username,Calls 2021-03-12 22:00:07,kms.amazonaws.com,GenerateDataKey,true,none,none,1 2021-03-12 22:00:12,kms.amazonaws.com,Decrypt,true,ASIAXMPLV4CQYXGMLRQ,greg,1 2021-03-12 22:00:36,kms.amazonaws.com,Decrypt,true,ASIAXMPLVMP6KEG3BRA,michael,1
In the example, there is a header row with names for each field, fields are separated by commas, and values are
not enclosed in quotes. The header is optional, but it simplilfies configuration. Values can be enclosed in single
or double quotes. In addition to commas, tabs, spaces, and pipes (|
) can be used as delimeters.
Generate data with a sample application
The GitHub repository for this guide includes a sample application that you can use to generate a dataset with data from your AWS account. The sample application runs locally or in AWS Lambda. It uses the AWS SDK to read events from AWS CloudTrail, reformats them with Pandas, and stores them in Amazon S3.
Example lambda_function.py
– Processing data
... def process_events(end_time, service): start_time = end_time - timedelta(minutes=60) records = [] # call CloudTrail responses = get_events(service, start_time, end_time) records = [{k: event.get(k) for k in fields} for response in responses for event in response.get('Events')] # format timeseries dataframe df = pd.DataFrame(records) ...
To set up the application, follow the instructions in the README
file
(data-processor) data-processor$
./5-backfill.sh
In Lambda, the application runs once every hour to generate a timeseries for the previous hour. The backfill script runs the application locally to generate two weeks worth of data and store it in the application's S3 bucket.
Collect existing data
To use existing data to run a backtest, collect it in one or more files and store it in a single folder in an Amazon S3 bucket. The data must contain timestamped entries for at least 285 intervals, where an interval is 5 minutes, 10 minutes, 1 hour, or 1 day long. Each file can have entries for one interval or multiple intervals.
s3://lookoutmetrics-dataset-1234567891012/backtest/20210104-20210110.csv 20210111-20210117.csv 20210118-20210125.csv
To use a detector in continuous mode, you need to store it at a different path for each interval as the intervals occur. For an example of how to do this, run the sample application.
For more information about supported data formats and folder structures, see Managing a dataset in Amazon S3.
Download sample data
Sample data is available in the service's GitHub repository at github.com/aws-samples/amazon-lookout-for-metrics-samples