Amazon Managed Service for Apache Flink was previously known as Amazon Kinesis Data Analytics for Apache Flink.
Analyze logs with CloudWatch Logs Insights
After you've added a CloudWatch logging option to your application as described in the previous section, you can use CloudWatch Logs Insights to query your log streams for specific events or errors.
CloudWatch Logs Insights enables you to interactively search and analyze your log data in CloudWatch Logs.
For information on getting started with CloudWatch Logs Insights, see Analyze Log Data with CloudWatch Logs Insights.
Run a sample query
This section describes how to run a sample CloudWatch Logs Insights query.
Prerequisites
-
Existing log groups and log streams set up in CloudWatch Logs.
-
Existing logs stored in CloudWatch Logs.
If you use services such as AWS CloudTrail, Amazon RouteĀ 53, or Amazon VPC, you've probably already set up logs from those services to go to CloudWatch Logs. For more information about sending logs to CloudWatch Logs, see Getting Started with CloudWatch Logs.
Queries in CloudWatch Logs Insights return either a set of fields from log events, or the result of a mathematical aggregation or other operation performed on log events. This section demonstrates a query that returns a list of log events.
To run a CloudWatch Logs Insights sample query
Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/
. -
In the navigation pane, choose Insights.
-
The query editor near the top of the screen contains a default query that returns the 20 most recent log events. Above the query editor, select a log group to query.
When you select a log group, CloudWatch Logs Insights automatically detects fields in the data in the log group and displays them in Discovered fields in the right pane. It also displays a bar graph of log events in this log group over time. This bar graph shows the distribution of events in the log group that matches your query and time range, not just the events displayed in the table.
-
Choose Run query.
The results of the query appear. In this example, the results are the most recent 20 log events of any type.
-
To see all of the fields for one of the returned log events, choose the arrow to the left of that log event.
For more information about how to run and modify CloudWatch Logs Insights queries, see Run and Modify a Sample Query.
Review example queries
This section contains CloudWatch Logs Insights example queries for analyzing Managed Service for Apache Flink application logs. These queries search for several example error conditions, and serve as templates for writing queries that find other error conditions.
Note
Replace the Region (us-west-2
), Account ID (012345678901
)
and application name (YourApplication
) in the following
query examples with
your application's Region and your Account ID.
This topic contains the following sections:
Analyze operations: Distribution of tasks
The following CloudWatch Logs Insights query returns the number of tasks the Apache Flink Job Manager distributes between Task Managers. You need to set the query's time frame to match one job run so that the query doesn't return tasks from previous jobs. For more information about Parallelism, see Implement application scaling.
fields @timestamp, message | filter message like /Deploying/ | parse message " to flink-taskmanager-*" as @tmid | stats count(*) by @tmid | sort @timestamp desc | limit 2000
The following CloudWatch Logs Insights query returns the subtasks assigned to each Task
Manager. The total number of subtasks is the sum of every task's parallelism. Task
parallelism is derived from operator parallelism, and is the same as the
application's parallelism by default, unless you change it in code by specifying
setParallelism
. For more information about setting operator
parallelism, see Setting the Parallelism: Operator Level
fields @timestamp, @tmid, @subtask | filter message like /Deploying/ | parse message "Deploying * to flink-taskmanager-*" as @subtask, @tmid | sort @timestamp desc | limit 2000
For more information about task scheduling, see
Jobs and Scheduling
Analyze operations: Change in parallelism
The following CloudWatch Logs Insights query returns changes to an application's parallelism (for example, due to automatic scaling). This query also returns manual changes to the application's parallelism. For more information about automatic scaling, see Use automatic scaling in Managed Service for Apache Flink.
fields @timestamp, @parallelism | filter message like /property: parallelism.default, / | parse message "default, *" as @parallelism | sort @timestamp asc
Analyze errors: Access denied
The following CloudWatch Logs Insights query returns Access Denied
logs.
fields @timestamp, @message, @messageType | filter applicationARN like /arn:aws:kinesisanalytics
us-west-2
:012345678901
:application\/YourApplication
/ | filter @message like /AccessDenied/ | sort @timestamp desc
Analyze errors: Source or sink not found
The following CloudWatch Logs Insights query returns ResourceNotFound
logs.
ResourceNotFound
logs result if a Kinesis source or sink is not found.
fields @timestamp,@message | filter applicationARN like /arn:aws:kinesisanalytics
us-west-2
:012345678901
:application\/YourApplication
/ | filter @message like /ResourceNotFoundException/ | sort @timestamp desc
Analyze errors: Application task-related failures
The following CloudWatch Logs Insights query returns an application's task-related failure
logs. These logs result if an application's status switches from RUNNING
to RESTARTING
.
fields @timestamp,@message | filter applicationARN like /arn:aws:kinesisanalytics
us-west-2
:012345678901
:application\/YourApplication
/ | filter @message like /switched from RUNNING to RESTARTING/ | sort @timestamp desc
For applications using Apache Flink version 1.8.2 and prior, task-related failures will
result in the application status switching from RUNNING
to FAILED
instead.
When using Apache Flink 1.8.2 and prior, use the following query to search for application task-related failures:
fields @timestamp,@message | filter applicationARN like /arn:aws:kinesisanalytics
us-west-2
:012345678901
:application\/YourApplication
/ | filter @message like /switched from RUNNING to FAILED/ | sort @timestamp desc