Enhanced monitoring dashboard - AWS Database Migration Service

Enhanced monitoring dashboard

The enhanced monitoring dashboard provides comprehensive visibility into critical metrics related to your monitoring tasks and replication instances. It allows you to filter, aggregate, and visualize metrics for specific resources you want to track. The dashboard directly publishes existing CloudWatch metrics, enabling you to monitor resource performance without altering data point sampling times.

Overview of the enhanced monitoring dashboard

The enhanced monitoring dashboard is available in the AWS DMS console. It provides a user-friendly interface with intuitive visualizations and charts, that enable you to monitor, analyze, and optimize your data migration processes effectively. With enhanced monitoring, you can streamline monitoring processes and quickly identify potential issues by viewing all relevant information in a centralized location.

On the enhanced monitoring dashboard, you can view metrics for tasks and replication instances and details for endpoints. You can also track the number of active CloudWatch alarms and the service health status for the current Region. The dashboard is available in all Commercial Regions where AWS DMS is available. There is no additional cost to use this dashboard.

Note

The enhanced monitoring dashboard doesn't support AWS DMS serverless replications.

Viewing metrics in the enhanced monitoring dashboard

To view the metrics in the enhanced monitoring dashboard, make sure that you have an IAM role with CloudWatch permissions. In addition, you need the cloudwatch:DescribeAlarms and health:DescribeEvents permissions to view the metrics. Finally, to access CloudWatch metrics in the enhanced monitoring dashboard, you must also have permission to access the GetMetricData API within your IAM policies. Without these permissions or the permissions for only a subset of APIs, you won't be able to view the metrics in the enhanced monitoring dashboard. For information about the IAM permissions you need, see IAM permissions needed to use AWS DMS.

Enhanced monitoring dashboard views

The AWS DMS enhanced monitoring dashboard has two types of views: Tasks and Replication instance views.

Tasks view

The Tasks view in the enhanced dashboard of AWS DMS provides a comprehensive overview of your data migration tasks. This view presents a centralized location where you can monitor and analyze various aspects of your tasks through intuitive charts and visualizations.

The Tasks view offers insights into key metrics and statistics related to your migration tasks, which enables you to gain a better understanding of their performance, progress, and overall health. Following are some of the key features and benefits of the Tasks view:

  • Task status summary: This chart displays the distribution of your tasks across different status categories, such as running, stopped, failed, or completed. You can quickly identify tasks that require attention and take appropriate actions.

  • Performance metrics: These charts illustrate the performance of your tasks, including metrics such as throughput, latency, CPU utilization, and more. These metrics help you identify potential bottlenecks and optimize your migration process.

  • Error analysis: In case of task failures or errors, the Tasks view provides detailed information about the errors encountered and their associated log entries. This information can help you in troubleshooting and resolving issues more efficiently.

  • Historical trends: The dashboard incorporates historical data, allowing you to analyze how your tasks have performed over a period of time. You can identify patterns, track progress, and make informed decisions based on these historical trends.

  • Filtering and sorting: The Tasks view enables you to filter and sort tasks based on various criteria, such as task name, tags, or specific time ranges. This flexibility allows you to focus on the tasks or aspects that are most relevant to your needs.

The following list describes the metrics that you can see in the Tasks view:

  • Full load throughput bandwidth - source: Represents incoming data transmitted from a full load from the source in KB per second.

  • Full load throughput bandwidth - target: Represents outgoing data transmitted from a full load for the target in KB per second.

  • Full load throughput rows - source: Represents incoming changes from a full load from the target in rows per second.

  • Full load throughput rows - target: Represents outgoing changes from a full load for the target in rows per second.

  • CDC throughput bandwidth - source: Represents network bandwidth for the source in KB per second.

    CDC throughput bandwidth records bandwidth on sampling points. If no network traffic is found, the value is zero. Because CDC doesn't issue long-running transactions, network traffic might not be recorded.

  • CDC throughput bandwidth - target: Represents network bandwidth for the target in KB per second.

    CDC throughput bandwidth records bandwidth on sampling points. If no network traffic is found, the value is zero. Because CDC doesn't issue long-running transactions, network traffic might not be recorded.

  • CDC throughput rows - source: Represents incoming task changes from the source in rows per second.

  • CDC throughput rows - target: Represents outgoing task changes for the target in rows per second.

  • CDC latency - source: Represents the gap, in seconds, between the last event captured from the source endpoint and current system time stamp of the AWS DMS instance. If no changes have been captured from the source due to task scoping, AWS DMS sets this value to zero.

  • CDC latency - target: Represents the gap, in seconds, between the first event timestamp waiting to commit on the target and the current timestamp of the AWS DMS instance. This value occurs if there are transactions that aren't handled by target. Otherwise, target latency is the same as source latency if all transactions are applied. Target latency should never be smaller than the source latency.

  • CPU utilization: Represents the percentage of CPU being used by a task across multiple cores. The semantics of task CPUUtilization is slightly different from the semantics of replication instance CPUUtilizaiton. If 1 vCPU is fully used, it indicates 100%, but if multiple vCPUs are in use, the value could exceed 100%.

  • Memory usage: Represents the control group (cgroup) memory.usage_in_bytes consumed by a task. AWS DMS uses cgroups to control the usage of system resources, such as memory and CPU. This metric indicates a task's memory usage in Megabytes within the cgroup allocated for that task.

    The cgroup limits are based on the resources available for your AWS DMS replication instance class. memory.usage_in_bytes consists of resident set size (RSS), cache, and swap components of memory. The operating system can reclaim cache memory if needed. We recommend that you also monitor the replication instance metric, AvailableMemory.

    AWS DMS raises this metric against the combined dimensions of ReplicationInstanceIdentifer and ReplicationTaskIdentifier in the CloudWatch console. Use the ReplicationInstanceIdentifier, ReplicationTaskIdentifier category to view this metric.

  • Validation record count: This chart is visible only if validation is enabled for the AWS DMS task. This is a combination of validation metrics available for the AWS DMS task, which includes the following:

    • ValidationSucceededRecordCount – Number of rows that AWS DMS validated per minute.

    • ValidationAttemptedRecordCount – Number of rows that the validation was attempted per minute.

    • ValidationFailedOverallCount – Number of rows where validation failed.

    • ValidationSuspendedOverallCount – Number of rows where validation was suspended.

    • ValidationPendingOverallCount – Number of rows where the validation is still pending.

In addition to the preceding metrics, you can customize the Task view and include additional metrics by adding them as a widgets. For information about these metrics, we recommend you review the following documentation:

Replication instance view

The Replication instance view provides a comprehensive overview of your replication instances, allowing you to monitor and manage your data replication infrastructure effectively. This view presents a centralized location where you can analyze various aspects of your replication instances through intuitive charts and visualizations.

The Replication instance view offers insights into key metrics and statistics related to your replication instances, which enables you to gain a better understanding of their performance, resource utilization, and overall health. Following are some of the key features and benefits of the Replication instance view:

  • Instance status summary: This chart displays details of the selected replication instance, such as availability zones, instance class, engine version, allocated storage, and current health status.

  • Resource utilization: The dashboard presents charts that illustrate the resource utilization of your replication instances, including metrics, such as CPU, memory, and disk usage. These metrics help you identify potential resource constraints and optimize your replication infrastructure.

  • Historical trends: The dashboard incorporates historical data, allowing you to analyze how your replication instances have performed over a period of time. You can identify patterns, track changes, and make informed decisions based on these historical trends.

  • Filtering and sorting: The Replication instance view enables you to filter and sort replication instances based on various criteria, such as instance name or specific time ranges. This flexibility allows you to focus on the instances or aspects that are most relevant to your needs.

The following list describes the metrics that you can see in the Replication instance view:

  • CloudWatch alarms: Represents the summary of the alarms in the AWS/DMS namespace.

  • CPU utilization: Represents the percentage of CPU being used by a task across multiple cores. The semantics of task CPUUtilization is slightly different from the semantics of replication CPUUtilizaiton. If 1 vCPU is fully used, it indicates 100%, but if multiple vCPUs are in use, the value could exceed 100%.

  • Memory usage by tasks per instance: Represents the resident set size (RSS) occupied by a task. It indicates the portion of memory occupied by a task held in main memory (RAM). Because parts of the occupied memory are paged out, or parts of the executable are never loaded, MemoryUsage doesn’t include memory held in swap space or file system.

  • Memory: Represents the amount of memory available, in use, or can be freed to use, and the amount of swap space used.

  • Available memory: Represents an estimate of how much memory is available for starting new applications, without swapping.

  • Free memory: Represents the amount of physical memory available for use by applications, page cache, and for the kernel’s own data structures.

  • Freeable memory: Freeable memory is not an indication of the actual free memory available. It is the memory that is currently in use that can be freed and used for other uses. It's is a combination of buffers and cache in use on the replication.

  • Swap usage: Represents the amount of swap space used on the replication instance.

  • Free storage: Represents the amount of available storage space in bytes.

In addition to the preceding metrics, you can customize the Replication instance view and include additional metrics by adding them as a widgets. For information about these metrics, we recommend you review the following documentation:

Retention of enhanced monitoring metrics

By default, enhanced monitoring metrics follow the retention policy for CloudWatch metrics. For information about changing the retention policy, see Change log data retention in CloudWatch Logs in the Amazon CloudWatch User Guide.