Neptune CloudWatch Metrics - Amazon Neptune

Neptune CloudWatch Metrics

Note

Amazon Neptune sends metrics to CloudWatch only when they have a non-zero value.

For all Neptune metrics, the aggregation granularity is 5 minutes.

Neptune CloudWatch Metrics

The following table lists the CloudWatch metrics that Neptune supports.

Note

All cumulative metrics are reset to zero whenever the server restarts, whether for maintenance, a reboot, or recovering from a crash.

Neptune CloudWatch metrics
Metric Description

BackupRetentionPeriodStorageUsed

The total amount of backup storage, in bytes, used to support from the Neptune DB cluster's backup retention window. Included in the total reported by the TotalBackupStorageBilled metric.

BufferCacheHitRatio

The percentage of requests that are served by the buffer cache. This metric can be useful in diagnosing query latency, because cache misses induce significant latency. If the cache hit ratio is below 99.9, consider upgrading the instance type to cache more data in memory.

ClusterReplicaLag

For a read replica, the amount of lag when replicating updates from the primary instance, in milliseconds.

ClusterReplicaLagMaximum

The maximum amount of lag between the primary instance and each Neptune DB instance in the DB cluster, in milliseconds.

ClusterReplicaLagMinimum

The minimum amount of lag between the primary instance and each Neptune DB instance in the DB cluster, in milliseconds.

CPUUtilization

The percentage of CPU utilization.

EngineUptime

The amount of time that the instance has been running, in seconds.

FreeableMemory

The amount of available random access memory, in bytes.

GlobalDbDataTransferBytes

The number of bytes of redo log data transferred from the primary AWS Region to a secondary AWS Region in a Neptune global database.

GlobalDbReplicatedWriteIO

The number of write I/O operations replicated from the primary AWS Region in the global database to the cluster volume in a secondary AWS Region.

The billing calculations for each DB cluster in a Neptune global database use the VolumeWriteIOPS metric to account for writes performed within that cluster. For the primary DB cluster, the billing calculations use GlobalDbReplicatedWriteIO to account for the cross-region replication to secondary DB clusters.

GlobalDbProgressLag

The number of milliseconds that a secondary cluster is behind the primary cluster for both user transactions and system transactions.

GremlinRequestsPerSec

Number of requests per second to the Gremlin engine.

GremlinWebSocketOpenConnections

The number of open WebSocket connections to Neptune.

LoaderRequestsPerSec

Number of loader requests per second.

MainRequestQueuePendingRequests

The number of requests waiting in the input queue pending execution. Neptune starts throttling requests when they exceed the maximum queue capacity.

NCUUtilization

Only applicable to a Neptune Serverless DB instance or DB cluster. At an instance level, reports a percentage calculated as the number of Neptune capacity units (NCUs) currently being used by the instance in question, divided by the maximum NCU capacity setting for the cluster. An NCU, or Neptune capacity unit, consists of 2 GiB (gibibyte) of memory (RAM), along with associated virtual processor capacity (vCPU) and networking.

At a cluster level, NCUUtilization reports the percentage of maximum capacity being used by the cluster as a whole.

NetworkThroughput

The amount of network throughput both received from and transmitted to clients by each instance in the Neptune DB cluster, in bytes per second. This throughput does not include network traffic between instances in the DB cluster and the cluster volume.

NetworkTransmitThroughput

The amount of outgoing network throughput transmitted to clients by each instance in the Neptune DB cluster, in bytes per second. This throughput does not include network traffic between instances in the DB cluster and the cluster volume.

NumIndexDeletesPerSec Number of deletes from individual indexes. Deletes from each index are counted individually. This includes the deletes that may get rolled back if a query encounters an error.
NumIndexInsertsPerSec Number of inserts to individual indexes. Inserts to each index are counted separately. This includes the inserts that may get rolled back if a query encounters an error.
NumIndexReadsPerSec Number of statements scanned from any index. Any access pattern starts with a search on an index and reads of all matching statements. An increase in this metric can cause an increase in query latencies or CPU utilization.

NumResultCacheHit

Number of Gremlin result cache hits.

NumResultCacheMiss

Number of Gremlin result cache misses.

NumTxCommitted

The number of transactions successfully committed per second.

NumTxOpened

The number of transactions opened on the server per second.

NumTxRolledBack

For write queries, the number of transactions per second rolled back on the server because of errors. For read-only queries, this metric is equal to the number of completed read-only transactions per second.

NumUndoPagesPurged This metric indicates the number of batches purged. This metric is indicator of progress in purging. The value is 0 for reader instances, and the metric only applies to the writer instance.

OpenCypherRequestsPerSec

Number of requests per second (both HTTPS and Bolt) to the openCypher engine.

OpenCypherBoltOpenConnections

The number of open Bolt connections to Neptune.

ResultCacheSizeInBytes

Total estimated size (in bytes) of all cached items in the Gremlin result cache.

ResultCacheItemCount

Number of items in the Gremlin result cache.

ResultCacheOldestItemTimestamp

The timestamp of the oldest item cached in the Gremlin result cache.

ResultCacheNewestItemTimestamp

The timestamp of the newest item cached in the Gremlin result cache.

ServerlessDatabaseCapacity

As an instance-level metric, ServerlessDatabaseCapacity reports the current instance capacity of a given Neptune serverless instance, in NCUs. An NCU, or Neptune capacity unit, consists of 2 GiB (gibibyte) of memory (RAM), along with associated virtual processor capacity (vCPU) and networking.

At a cluster-level, ServerlessDatabaseCapacity reports the average of all the ServerlessDatabaseCapacity values of the DB instances in the cluster.

SnapshotStorageUsed

The total amount of backup storage consumed by all snapshots for a Neptune DB cluster outside its backup retention window, in bytes. Included in the total reported by the TotalBackupStorageBilled metric.

SparqlRequestsPerSec

The number of requests per second to the SPARQL engine.

StatsNumStatementsScanned

The total number of statements scanned for DFE statistics since the server started.

Every time statistics computation is triggered, this number increases, but when no computation is happening, it remains static. As a result, if you graph it over time, you can tell when computation happened and when it didn't:

Graph of StatsNumStatementsScanned values over time

By looking at the slope of the graph in periods where the metric is increasing, you can also tell how quickly the computation was going.

If there is no such metric, it means that the statistics feature is disabled on your DB cluster, or that the engine version you're running doesn't have the statistics feature. If the metric value is zero, it means that no statistics computation has occurred.

StorageNetworkThroughput The amount of network throughput received from and sent to the storage subsystem by each instance in the Neptune DB cluster.

TotalBackupStorageBilled

The total amount of backup storage for which you are billed for a given Neptune DB cluster, in bytes. Includes the backup storage measured by the BackupRetentionPeriodStorageUsed and SnapshotStorageUsed metrics.

TotalRequestsPerSec

The total number of requests per second to the server from all sources.

TotalClientErrorsPerSec

The total number per second of requests that errored out because of client-side issues.

TotalServerErrorsPerSec

The total number per second of requests that errored out on the server because of internal failures.

UndoLogListSize

The count of undo logs in the undo log list.

Undo logs contain records of committed transactions that expire when all active transactions are more recent than the commit time. The expired records are periodically purged. Records for delete operations can take longer to purge than records for other types of transaction.

Purging is done exclusively by the DB cluster's writer instance, so the rate of purging is dependent on the writer instance type. If the UndoLogListSize is high and growing in your DB cluster, upgrade the writer instance to increase the purge rate.

Also, if you are upgrading to engine version 1.2.0.0 or higher from a version earlier than 1.2.0.0, first make sure that the UndoLogListSize value is close to 0. Because engine versions 1.2.0.0 and higher use a different format for undo logs, the upgrade can only begin after your previous undo logs have been fully purged. See Upgrading to 1.2.0.0 or above for more information.

VolumeBytesUsed

The total amount of storage allocated to your Neptune DB cluster, in bytes. This is the amount of storage for which you are billed. It is the maximum amount of storage allocated to your DB cluster at any point in its existence, not the amount you are currently using (see Neptune storage billing).

VolumeReadIOPs

The total number of billed read I/O operations from a cluster volume, reported a 5-minute intervals. Billed read operations are calculated at the cluster volume level, aggregated from all instances in the Neptune DB cluster, and then reported at 5-minute intervals.

VolumeWriteIOPs

The total number of write disk I/O operations to the cluster volume, reported at 5-minute intervals.

CloudWatch Metrics That Are Now Deprecated in Neptune

Use of these Neptune metrics has now been deprecated. They are still supported, but may be eliminated in the future as new and better metrics become available.

Metric

Description

GremlinHttp1xx

Number of HTTP 1xx responses for the Gremlin endpoint per second.

We recommend that you use the new Http1xx combined metric instead.

GremlinHttp2xx

Number of HTTP 2xx responses for the Gremlin endpoint per second.

We recommend that you use the new Http2xx combined metric instead.

GremlinHttp4xx

Number of HTTP 4xx errors for the Gremlin endpoint per second.

We recommend that you use the new Http4xx combined metric instead.

GremlinHttp5xx

Number of HTTP 5xx errors for the Gremlin endpoint per second.

We recommend that you use the new Http5xx combined metric instead.

GremlinErrors

Number of errors in Gremlin traversals.

GremlinRequests

Number of requests to Gremlin engine.

GremlinWebSocketSuccess

Number of successful WebSocket connections to the Gremlin endpoint per second.

GremlinWebSocketClientErrors

Number of WebSocket client errors on the Gremlin endpoint per second.

GremlinWebSocketServerErrors

Number of WebSocket server errors on the Gremlin endpoint per second.

GremlinWebSocketAvailableConnections

Number of potential WebSocket connections currently available.

Http100

Number of HTTP 100 responses for the endpoint per second.

We recommend that you use the new Http1xx combined metric instead.

Http101

Number of HTTP 101 responses for the endpoint per second.

We recommend that you use the new Http1xx combined metric instead.

Http1xx

Number of HTTP 1xx responses for the endpoint per second.

Http200

Number of HTTP 200 responses for the endpoint per second.

We recommend that you use the new Http2xx combined metric instead.

Http2xx

Number of HTTP 2xx responses for the endpoint per second.

Http400

Number of HTTP 400 errors for the endpoint per second.

We recommend that you use the new Http4xx combined metric instead.

Http403

Number of HTTP 403 errors for the endpoint per second.

We recommend that you use the new Http4xx combined metric instead.

Http405

Number of HTTP 405 errors for the endpoint per second.

We recommend that you use the new Http4xx combined metric instead.

Http413

Number of HTTP 413 errors for the endpoint per second.

We recommend that you use the new Http4xx combined metric instead.

Http429

Number of HTTP 429 errors for the endpoint per second.

We recommend that you use the new Http4xx combined metric instead.

Http4xx

Number of HTTP 4xx errors for the endpoint per second.

Http500

Number of HTTP 500 errors for the endpoint per second.

We recommend that you use the new Http5xx combined metric instead.

Http501

Number of HTTP 501 errors for the endpoint per second.

We recommend that you use the new Http5xx combined metric instead.

Http5xx

Number of HTTP 5xx errors for the endpoint per second.

LoaderErrors

Number of errors from Loader requests.

LoaderRequests

Number of Loader Requests.

SparqlHttp1xx

Number of HTTP 1xx responses for the SPARQL endpoint per second.

We recommend that you use the new Http1xx combined metric instead.

SparqlHttp2xx

Number of HTTP 2xx responses for the SPARQL endpoint per second.

We recommend that you use the new Http2xx combined metric instead.

SparqlHttp4xx

Number of HTTP 4xx errors for the SPARQL endpoint per second.

We recommend that you use the new Http4xx combined metric instead.

SparqlHttp5xx

Number of HTTP 5xx errors for the SPARQL endpoint per second.

We recommend that you use the new Http5xx combined metric instead.

SparqlErrors

Number of errors in the SPARQL queries.

SparqlRequests

Number of requests to the SPARQL engine.

StatusErrors

Number of errors from the status endpoint.

StatusRequests

Number of requests to the status endpoint.