Using CloudWatch to monitor and log GraphQL API data

Focus mode

Using CloudWatch to monitor and log GraphQL API data - AWS AppSync GraphQL

Setup and configuration CloudWatch metrics CloudWatch logs Log type reference Analyzing your logs with CloudWatch Logs Insights Analyze your logs with OpenSearch Service Log format migration

You can log and debug your GraphQL API using CloudWatch metrics and CloudWatch logs. These tools enable developers to monitor performance, troubleshoot issues, and optimize their GraphQL operations effectively.

CloudWatch metrics is a tool that provides a wide range of metrics to monitor API performance and usage. These metrics fall into two main categories:

General API Metrics: These include 4XXError and 5XXError for tracking client and server errors, Latency for measuring response times, Requests for monitoring total API calls, and TokensConsumed for tracking resource usage.
Real-time Subscription Metrics: These metrics focus on WebSocket connections and subscription activities. They include metrics for connection requests, successful connections, subscription registrations, message publishing, and active connections and subscriptions.

The guide also introduces Enhanced Metrics, which offer more granular data on resolver performance, data source interactions, and individual GraphQL operations. These metrics provide deeper insights but come with additional costs.

CloudWatch Logs is a tool that enables logging capabilities for your GraphQL APIs. Logs can be set at two levels of the API:

Request-level Logs: These capture overall request information, including HTTP headers, GraphQL queries, operation summaries, and subscription registrations.
Field-level Logs: These provide detailed information about individual field resolutions, including request and response mappings, and tracing information for each field.

You can configure logging, interpret log entries, and use log data for troubleshooting and optimization. AWS AppSync provides various log types that reveal your query’s execution, parsing, validation, and field resolution data.

Setup and configuration

To turn on automatic logging on a GraphQL API, use the AWS AppSync console.

Sign in to the AWS Management Console and open the AppSync console.
On the APIs page, choose the name of a GraphQL API.
On your API's homepage, in the navigation pane, choose Settings.
Under Logging, do the following:
1. Turn on Enable Logs.
2. For detailed request-level logging, select the check box under Include verbose content. (optional)
3. Under Field resolver log level, choose your preferred field-level logging level (None, Error, Info, Debug, or All). (optional)
4. Under Create or use an existing role, choose New role to create a new AWS Identity and Access Management (IAM) that allows AWS AppSync to write logs to CloudWatch. Or, choose Existing role to select the Amazon Resource Name (ARN) of an existing IAM role in your AWS account.
Choose Save.

Manual IAM role configuration

If you choose to use an existing IAM role, the role must grant AWS AppSync the required permissions to write logs to CloudWatch. To configure this manually, you must provide a service role ARN so that AWS AppSync can assume the role when writing the logs.

In the IAM console, create a new policy with the name AWSAppSyncPushToCloudWatchLogsPolicy that has the following definition:


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}

Next, create a new role with the name AWSAppSyncPushToCloudWatchLogsRole, and attach the newly created policy to the role. Edit the trust relationship for this role to the following:


{
    "Version": "2012-10-17",
    "Statement": [
        {
        "Effect": "Allow",
        "Principal": {
            "Service": "appsync.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
        }
    ]
}

Copy the role ARN and use it when setting up logging for an AWS AppSync GraphQL API.

CloudWatch metrics

You can use CloudWatch metrics to monitor and provide alerts about specific events that can result in HTTP status codes or from latency. The following metrics are emitted:

4XXError

Errors resulting from requests that are not valid due to an incorrect client configuration. Typically, these errors happen anywhere outside of GraphQL processing. For example, these errors can occur when the request includes an incorrect JSON payload or an incorrect query, when the service is throttled, or when the authorization settings are misconfigured.

Unit: Count. Use the Sum statistic to get the total occurrences of these errors.

5XXError

Errors encountered during the running of a GraphQL query. For example, this can occur when invoking a query for an empty or incorrect schema. It can also occur when the Amazon Cognito user pool ID or AWS Region is not valid. Alternatively, this could also happen if AWS AppSync encounters an issue during processing of a request.

Unit: Count. Use the Sum statistic to get the total occurrences of these errors.

Latency

The time between when AWS AppSync receives a request from a client and when it returns a response to the client. This doesn’t include the network latency encountered for a response to reach the end devices.

Unit: Millisecond. Use the Average statistic to evaluate expected latencies.

Requests

The number of requests (queries + mutations) that all APIs in your account have processed, by Region.

Unit: Count. The number of all requests processed in a particular Region.

TokensConsumed

Tokens are allocated to Requests based on the amount of resources (processing time and memory used) that a Request consumes. Usually, each Request consumes one token. However, a Request that consumes large amounts of resources is allocated additional tokens as needed.

Unit: Count. The number of tokens allocated to requests processed in a particular Region.

NetworkBandwidthOutAllowanceExceeded

Note

In the AWS AppSync console, on the cache settings page, the Cache Health Metrics option allows you to enable this cache-related health metric.

The network packets dropped because the throughput exceeded the aggregated bandwidth limit. This is useful for diagnosing bottlenecks in a cache configuration. Data is recorded for a particular API by specifying the API_Id in the appsyncCacheNetworkBandwidthOutAllowanceExceeded metric.

Unit: Count. The number of packets dropped after exceeding the bandwidth limit for an API specified by ID.

EngineCPUUtilization

Note

In the AWS AppSync console, on the cache settings page, the Cache Health Metrics option allows you to enable this cache-related health metric.

The CPU utilization (percentage) allocated to the Redis OSS process. This is useful for diagnosing bottlenecks in a cache configuration. Data is recorded for a particular API by specifying the API_Id in the appsyncCacheEngineCPUUtilization metric.

Unit: Percent. The CPU percentage currently in use by the Redis OSS process for an API specified by ID.

Metrics list

4XXError

Unit: Count. Use the Sum statistic to get the total occurrences of these errors.

5XXError

Unit: Count. Use the Sum statistic to get the total occurrences of these errors.

Latency

Unit: Millisecond. Use the Average statistic to evaluate expected latencies.

Requests

The number of requests (queries + mutations) that all APIs in your account have processed, by Region.

Unit: Count. The number of all requests processed in a particular Region.

TokensConsumed

Unit: Count. The number of tokens allocated to requests processed in a particular Region.

NetworkBandwidthOutAllowanceExceeded

Note

In the AWS AppSync console, on the cache settings page, the Cache Health Metrics option allows you to enable this cache-related health metric.

Unit: Count. The number of packets dropped after exceeding the bandwidth limit for an API specified by ID.

EngineCPUUtilization

Note

In the AWS AppSync console, on the cache settings page, the Cache Health Metrics option allows you to enable this cache-related health metric.

Unit: Percent. The CPU percentage currently in use by the Redis OSS process for an API specified by ID.

Real-time subscriptions

All metrics are emitted in one dimension: GraphQLAPIId. This means that all metrics are coupled with GraphQL API IDs. The following metrics are related to GraphQL subscriptions over pure WebSockets:

ConnectRequests

The number of WebSocket connection requests made to AWS AppSync, including both successful and unsuccessful attempts.

Unit: Count. Use the Sum statistic to get the total number of connection requests.

ConnectSuccess

The number of successful WebSocket connections to AWS AppSync. It is possible to have connections without subscriptions.

Unit: Count. Use the Sum statistic to get the total occurrences of the successful connections.

ConnectClientError

The number of WebSocket connections that were rejected by AWS AppSync because of client-side errors. This could imply that the service is throttled or that the authorization settings are misconfigured.

Unit: Count. Use the Sum statistic to get the total occurrences of the client-side connection errors.

ConnectServerError

The number of errors that originated from AWS AppSync while processing connections. This usually happens when an unexpected server-side issue occurs.

Unit: Count. Use the Sum statistic to get the total occurrences of the server-side connection errors.

DisconnectSuccess

The number of successful WebSocket disconnections from AWS AppSync.

Unit: Count. Use the Sum statistic to get the total occurrences of the successful disconnections.

DisconnectClientError

The number of client errors that originated from AWS AppSync while disconnecting WebSocket connections.

Unit: Count. Use the Sum statistic to get the total occurrences of the disconnection errors.

DisconnectServerError

The number of server errors that originated from AWS AppSync while disconnecting WebSocket connections.

Unit: Count. Use the Sum statistic to get the total occurrences of the disconnection errors.

SubscribeSuccess

The number of subscriptions that were successfully registered to AWS AppSync through WebSocket. It's possible to have connections without subscriptions, but it's not possible to have subscriptions without connections.

Unit: Count. Use the Sum statistic to get the total occurrences of the successful subscriptions.

SubscribeClientError

The number of subscriptions that were rejected by AWS AppSync because of client-side errors. This can occur when a JSON payload is incorrect, the service is throttled, or the authorization settings are misconfigured.

Unit: Count. Use the Sum statistic to get the total occurrences of the client-side subscription errors.

SubscribeServerError

The number of errors that originated from AWS AppSync while processing subscriptions. This usually happens when an unexpected server-side issue occurs.

Unit: Count. Use the Sum statistic to get the total occurrences of the server-side subscription errors.

UnsubscribeSuccess

The number of unsubscribe requests that were successfully processed.

Unit: Count. Use the Sum statistic to get the total occurrences of the successful unsubscribe requests.

UnsubscribeClientError

The number of unsubscribe requests that were rejected by AWS AppSync because of client-side errors.

Unit: Count. Use the Sum statistic to get the total occurrences of the client-side unsubscribe request errors.

UnsubscribeServerError

The number of errors that originated from AWS AppSync while processing unsubscribe requests. This usually happens when an unexpected server-side issue occurs.

Unit: Count. Use the Sum statistic to get the total occurrences of the server-side unsubscribe request errors.

PublishDataMessageSuccess

The number of subscription event messages that were successfully published.

Unit: Count. Use the Sum statistic to get the total of the subscription event messages were successfully published.

PublishDataMessageClientError

The number of subscription event messages that failed to publish because of client-side errors.

Unit: Count. Use the Sum statistic to get the total occurrences of the client-side publishing subscription events errors.

PublishDataMessageServerError

The number of errors that originated from AWS AppSync while publishing subscription event messages. This usually happens when an unexpected server-side issue occurs.

Unit: Count. Use the Sum statistic to get the total occurrences of the server-side publishing subscription events errors.

PublishDataMessageSize

The size of subscription event messages published.

Unit: Bytes.

ActiveConnections

The number of concurrent WebSocket connections from clients to AWS AppSync in 1 minute.

Unit: Count. Use the Sum statistic to get the total opened connections.

ActiveSubscriptions

The number of concurrent subscriptions from clients in 1 minute.

Unit: Count. Use the Sum statistic to get the total active subscriptions.

ConnectionDuration

The amount of time that the connection stays open.

Unit: Milliseconds. Use the Average statistic to evaluate connection duration.

OutboundMessages

The number of metered messages successfully published. One metered message equals 5 kB of delivered data.

Unit: Count. Use the Sum statistic to get the total number of successfully published metered messages.

InboundMessageSuccess

The number of inbound messages successfully processed. Each subscription type invoked by a mutation generates one inbound message.

Unit: Count. Use the Sum statistic to get the total number of successfully processed inbound messages.

InboundMessageError

The number of inbound messages that failed processing due to invalid API requests, such as exceeding the 240 kB subscription payload size limit.

Unit: Count. Use the Sum statistic to get the total number of inbound messages with API-related processing failures.

InboundMessageFailure

The number of inbound messages that failed processing due to errors from AWS AppSync.

Unit: Count. Use the Sum statistic to get the total number of inbound messages with AWS AppSync-related processing failures.

InboundMessageDelayed

The number of delayed inbound messages. Inbound messages can be delayed when either the inbound message rate quota or outbound message rate quota is breached.

Unit: Count. Use the Sum statistic to get the total number of inbound messages that were delayed.

InboundMessageDropped

The number of dropped inbound messages. Inbound messages can be dropped when either the inbound message rate quota or outbound message rate quota is breached.

Unit: Count. Use the Sum statistic to get the total number of inbound messages that were dropped.

InvalidationSuccess

The number of subscriptions successfully invalidated (unsubscribed) by a mutation with $extensions.invalidateSubscriptions().

Unit: Count. Use the Sum statistic to retrieve the total number of subscriptions that were successfully unsubscribed.

InvalidationRequestSuccess

The number of invalidation requests successfully processed.

Unit: Count. Use the Sum statistic to get the total number of successfully processed invalidation requests.

InvalidationRequestError

The number of invalidation requests that failed processing due to invalid API requests.

Unit: Count. Use the Sum statistic to get the total number of invalidation requests with API-related processing failures.

InvalidationRequestFailure

The number of invalidation requests that failed processing due to errors from AWS AppSync.

Unit: Count. Use the Sum statistic to get the total number of invalidation requests with AWS AppSync-related processing failures.

InvalidationRequestDropped

The number of invalidation requests dropped when the invalidation request quota was exceeded.

Unit: Count. Use the Sum statistic to get the total number of dropped invalidation requests.

Metrics list

ConnectRequests

The number of WebSocket connection requests made to AWS AppSync, including both successful and unsuccessful attempts.

Unit: Count. Use the Sum statistic to get the total number of connection requests.

ConnectSuccess

The number of successful WebSocket connections to AWS AppSync. It is possible to have connections without subscriptions.

Unit: Count. Use the Sum statistic to get the total occurrences of the successful connections.

ConnectClientError

Unit: Count. Use the Sum statistic to get the total occurrences of the client-side connection errors.

ConnectServerError

The number of errors that originated from AWS AppSync while processing connections. This usually happens when an unexpected server-side issue occurs.

Unit: Count. Use the Sum statistic to get the total occurrences of the server-side connection errors.

DisconnectSuccess

The number of successful WebSocket disconnections from AWS AppSync.

Unit: Count. Use the Sum statistic to get the total occurrences of the successful disconnections.

DisconnectClientError

The number of client errors that originated from AWS AppSync while disconnecting WebSocket connections.

Unit: Count. Use the Sum statistic to get the total occurrences of the disconnection errors.

DisconnectServerError

The number of server errors that originated from AWS AppSync while disconnecting WebSocket connections.

Unit: Count. Use the Sum statistic to get the total occurrences of the disconnection errors.

SubscribeSuccess

Unit: Count. Use the Sum statistic to get the total occurrences of the successful subscriptions.

SubscribeClientError

Unit: Count. Use the Sum statistic to get the total occurrences of the client-side subscription errors.

SubscribeServerError

The number of errors that originated from AWS AppSync while processing subscriptions. This usually happens when an unexpected server-side issue occurs.

Unit: Count. Use the Sum statistic to get the total occurrences of the server-side subscription errors.

UnsubscribeSuccess

The number of unsubscribe requests that were successfully processed.

Unit: Count. Use the Sum statistic to get the total occurrences of the successful unsubscribe requests.

UnsubscribeClientError

The number of unsubscribe requests that were rejected by AWS AppSync because of client-side errors.

Unit: Count. Use the Sum statistic to get the total occurrences of the client-side unsubscribe request errors.

UnsubscribeServerError

The number of errors that originated from AWS AppSync while processing unsubscribe requests. This usually happens when an unexpected server-side issue occurs.

Unit: Count. Use the Sum statistic to get the total occurrences of the server-side unsubscribe request errors.

PublishDataMessageSuccess

The number of subscription event messages that were successfully published.

Unit: Count. Use the Sum statistic to get the total of the subscription event messages were successfully published.

PublishDataMessageClientError

The number of subscription event messages that failed to publish because of client-side errors.

Unit: Count. Use the Sum statistic to get the total occurrences of the client-side publishing subscription events errors.

PublishDataMessageServerError

The number of errors that originated from AWS AppSync while publishing subscription event messages. This usually happens when an unexpected server-side issue occurs.

Unit: Count. Use the Sum statistic to get the total occurrences of the server-side publishing subscription events errors.

PublishDataMessageSize

The size of subscription event messages published.

Unit: Bytes.

ActiveConnections

The number of concurrent WebSocket connections from clients to AWS AppSync in 1 minute.

Unit: Count. Use the Sum statistic to get the total opened connections.

ActiveSubscriptions

The number of concurrent subscriptions from clients in 1 minute.

Unit: Count. Use the Sum statistic to get the total active subscriptions.

ConnectionDuration

The amount of time that the connection stays open.

Unit: Milliseconds. Use the Average statistic to evaluate connection duration.

OutboundMessages

The number of metered messages successfully published. One metered message equals 5 kB of delivered data.

Unit: Count. Use the Sum statistic to get the total number of successfully published metered messages.

InboundMessageSuccess

The number of inbound messages successfully processed. Each subscription type invoked by a mutation generates one inbound message.

Unit: Count. Use the Sum statistic to get the total number of successfully processed inbound messages.

InboundMessageError

The number of inbound messages that failed processing due to invalid API requests, such as exceeding the 240 kB subscription payload size limit.

Unit: Count. Use the Sum statistic to get the total number of inbound messages with API-related processing failures.

InboundMessageFailure

The number of inbound messages that failed processing due to errors from AWS AppSync.

Unit: Count. Use the Sum statistic to get the total number of inbound messages with AWS AppSync-related processing failures.

InboundMessageDelayed

The number of delayed inbound messages. Inbound messages can be delayed when either the inbound message rate quota or outbound message rate quota is breached.

Unit: Count. Use the Sum statistic to get the total number of inbound messages that were delayed.

InboundMessageDropped

The number of dropped inbound messages. Inbound messages can be dropped when either the inbound message rate quota or outbound message rate quota is breached.

Unit: Count. Use the Sum statistic to get the total number of inbound messages that were dropped.

InvalidationSuccess

The number of subscriptions successfully invalidated (unsubscribed) by a mutation with $extensions.invalidateSubscriptions().

Unit: Count. Use the Sum statistic to retrieve the total number of subscriptions that were successfully unsubscribed.

InvalidationRequestSuccess

The number of invalidation requests successfully processed.

Unit: Count. Use the Sum statistic to get the total number of successfully processed invalidation requests.

InvalidationRequestError

The number of invalidation requests that failed processing due to invalid API requests.

Unit: Count. Use the Sum statistic to get the total number of invalidation requests with API-related processing failures.

InvalidationRequestFailure

The number of invalidation requests that failed processing due to errors from AWS AppSync.

Unit: Count. Use the Sum statistic to get the total number of invalidation requests with AWS AppSync-related processing failures.

InvalidationRequestDropped

The number of invalidation requests dropped when the invalidation request quota was exceeded.

Unit: Count. Use the Sum statistic to get the total number of dropped invalidation requests.

Comparing inbound and outbound messages

When a mutation is executed, subscription fields with the @aws_subscribe directive for that mutation are invoked. Each subscription invocation generates one inbound message. For example, if two subscription fields specify the same mutation in @aws_subscribe, then two inbound messages are generated when that mutation is called.

One outbound message equals 5 kB of data delivered to WebSocket clients. For example, sending 15 kB of data to 10 clients results in 30 outbound messages (15 kB * 10 clients / 5 kB per message = 30 messages).

You can request quota increases for either inbound or outbound messages. For more information, see AWS AppSync endpoints and quotas in the AWS General Reference guide and the instructions for Requesting a quota increase in the Service Quotas User Guide.

Enhanced metrics

Enhanced metrics emit granular data on API usage and performance such as AWS AppSync request and error counts, latency, and cache hits/misses. All enhanced metric data is sent to your CloudWatch account, and you can configure the types of data that will be sent.

Note

Additional charges are applied when using enhanced metrics. For more information, see detailed monitoring pricing tiers in Amazon CloudWatch pricing.

These metrics can be found on various settings pages in the AWS AppSync console. On the API settings page, the Enhanced Metrics section allows you to enable or disable the following items:

Resolver metrics behavior: These options control how additional metrics for resolvers are collected. You can choose to enable full request resolver metrics (metrics enabled for all resolvers in requests) or per-resolver metrics (metrics only enabled for resolvers where the configuration is set to enabled). The following options are available:

GraphQL errors per resolver (GraphQLError)

The number of GraphQL errors that occured per resolver.