Checking uptime for Aurora clusters and instances - Amazon Aurora

Checking uptime for Aurora clusters and instances

You can check and monitor the length of time since the last reboot for each DB instance in your Aurora cluster. The Amazon CloudWatch metric EngineUptime reports the number of seconds since the last time a DB instance was started. You can examine this metric at a point in time to find out the uptime for the DB instance. You can also monitor this metric over time to detect when the instance is rebooted.

You can also examine the EngineUptime metric at the cluster level. The Minimum and Maximum dimensions report the smallest and largest uptime values for all DB instances in the cluster. To check the most recent time when any reader instance in a cluster was rebooted, or restarted for another reason, monitor the cluster-level metric using the Minimum dimension. To check which instance in the cluster has gone the longest without a reboot, monitor the cluster-level metric using the Maximum dimension. For example, you might want to confirm that all DB instances in the cluster were rebooted after a configuration change.

Tip

For long-term monitoring, we recommend monitoring the EngineUptime metric for individual instances instead of at the cluster level. The cluster-level EngineUptime metric is set to zero when a new DB instance is added to the cluster. Such cluster changes can happen as part of maintenance and scaling operations such as those performed by Auto Scaling.

The following CLI examples show how to examine the EngineUptime metric for the writer and reader instances in a cluster. The examples use a cluster named tpch100g. This cluster has a writer DB instance instance-1234. It also has two reader DB instances, instance-7448 and instance-6305.

First, the reboot-db-instance command reboots one of the reader instances. The wait command waits until the instance is finished rebooting.

$ aws rds reboot-db-instance --db-instance-identifier instance-6305 { "DBInstance": { "DBInstanceIdentifier": "instance-6305", "DBInstanceStatus": "rebooting", ... $ aws rds wait db-instance-available --db-instance-id instance-6305

The CloudWatch get-metric-statistics command examines the EngineUptime metric over the last five minutes at one-minute intervals. The uptime for the instance-6305 instance is reset to zero and begins counting upwards again. This AWS CLI example for Linux uses $() variable substitution to insert the appropriate timestamps into the CLI commands. It also uses the Linux sort command to order the output by the time the metric was collected. That timestamp value is the third field in each line of output.

$ aws cloudwatch get-metric-statistics --metric-name "EngineUptime" \ --start-time "$(date -d '5 minutes ago')" --end-time "$(date -d 'now')" \ --period 60 --namespace "AWS/RDS" --statistics Maximum \ --dimensions Name=DBInstanceIdentifier,Value=instance-6305 --output text \ | sort -k 3 EngineUptime DATAPOINTS 231.0 2021-03-16T18:19:00+00:00 Seconds DATAPOINTS 291.0 2021-03-16T18:20:00+00:00 Seconds DATAPOINTS 351.0 2021-03-16T18:21:00+00:00 Seconds DATAPOINTS 411.0 2021-03-16T18:22:00+00:00 Seconds DATAPOINTS 471.0 2021-03-16T18:23:00+00:00 Seconds

The minimum uptime for the cluster is reset to zero because one of the instances in the cluster was rebooted. The maximum uptime for the cluster isn't reset because at least one of the DB instances in the cluster remained available.

$ aws cloudwatch get-metric-statistics --metric-name "EngineUptime" \ --start-time "$(date -d '5 minutes ago')" --end-time "$(date -d 'now')" \ --period 60 --namespace "AWS/RDS" --statistics Minimum \ --dimensions Name=DBClusterIdentifier,Value=tpch100g --output text \ | sort -k 3 EngineUptime DATAPOINTS 63099.0 2021-03-16T18:12:00+00:00 Seconds DATAPOINTS 63159.0 2021-03-16T18:13:00+00:00 Seconds DATAPOINTS 63219.0 2021-03-16T18:14:00+00:00 Seconds DATAPOINTS 63279.0 2021-03-16T18:15:00+00:00 Seconds DATAPOINTS 51.0 2021-03-16T18:16:00+00:00 Seconds $ aws cloudwatch get-metric-statistics --metric-name "EngineUptime" \ --start-time "$(date -d '5 minutes ago')" --end-time "$(date -d 'now')" \ --period 60 --namespace "AWS/RDS" --statistics Maximum \ --dimensions Name=DBClusterIdentifier,Value=tpch100g --output text \ | sort -k 3 EngineUptime DATAPOINTS 63389.0 2021-03-16T18:16:00+00:00 Seconds DATAPOINTS 63449.0 2021-03-16T18:17:00+00:00 Seconds DATAPOINTS 63509.0 2021-03-16T18:18:00+00:00 Seconds DATAPOINTS 63569.0 2021-03-16T18:19:00+00:00 Seconds DATAPOINTS 63629.0 2021-03-16T18:20:00+00:00 Seconds

Then another reboot-db-instance command reboots the writer instance of the cluster. Another wait command pauses until the writer instance is finished rebooting.

$ aws rds reboot-db-instance --db-instance-identifier instance-1234 { "DBInstanceIdentifier": "instance-1234", "DBInstanceStatus": "rebooting", ... $ aws rds wait db-instance-available --db-instance-id instance-1234

Now the EngineUptime metric for the writer instance shows that the instance instance-1234 was rebooted recently. The reader instance instance-6305 was also rebooted automatically along with the writer instance. This cluster is running Aurora MySQL 2.09, which doesn't keep the reader instances running as the writer instance reboots.

$ aws cloudwatch get-metric-statistics --metric-name "EngineUptime" \ --start-time "$(date -d '5 minutes ago')" --end-time "$(date -d 'now')" \ --period 60 --namespace "AWS/RDS" --statistics Maximum \ --dimensions Name=DBInstanceIdentifier,Value=instance-1234 --output text \ | sort -k 3 EngineUptime DATAPOINTS 63749.0 2021-03-16T18:22:00+00:00 Seconds DATAPOINTS 63809.0 2021-03-16T18:23:00+00:00 Seconds DATAPOINTS 63869.0 2021-03-16T18:24:00+00:00 Seconds DATAPOINTS 41.0 2021-03-16T18:25:00+00:00 Seconds DATAPOINTS 101.0 2021-03-16T18:26:00+00:00 Seconds $ aws cloudwatch get-metric-statistics --metric-name "EngineUptime" \ --start-time "$(date -d '5 minutes ago')" --end-time "$(date -d 'now')" \ --period 60 --namespace "AWS/RDS" --statistics Maximum \ --dimensions Name=DBInstanceIdentifier,Value=instance-6305 --output text \ | sort -k 3 EngineUptime DATAPOINTS 411.0 2021-03-16T18:22:00+00:00 Seconds DATAPOINTS 471.0 2021-03-16T18:23:00+00:00 Seconds DATAPOINTS 531.0 2021-03-16T18:24:00+00:00 Seconds DATAPOINTS 49.0 2021-03-16T18:26:00+00:00 Seconds