Explore your data using visualization techniques - Amazon SageMaker

Explore your data using visualization techniques

Note

You can only use SageMaker Canvas visualizations for models built on tabular datasets. Multi-category text prediction models are also excluded.

With Amazon SageMaker Canvas, you can explore and visualize your data to gain advanced insights into your data before building your ML models. You can visualize using scatter plots, bar charts, and box plots, which can help you understand your data and discover the relationships between features that could affect the model accuracy.

In the Build tab of the SageMaker Canvas application, choose Data visualizer to begin creating your visualizations.

You can change the visualization sample size to adjust the size of the random sample taken from your dataset. A sample size that is too large might affect the performance of your data visualizations, so we recommend that you choose an appropriate sample size. To change the sample size, use the following procedure.

  1. Choose Visualization sample.

  2. Use the slider to select your desired sample size.

  3. Choose Update to confirm the change to your sample size.

Note

Certain visualization techniques require columns of a specific data type. For example, you can only use numeric columns for the x and y-axes of scatter plots.

Scatter plot

To create a scatter plot with your dataset, choose Scatter plot in the Visualization panel. Choose the features you want to plot on the x and y-axes from the Columns section. You can drag and drop the columns onto the axes or, once an axis has been dropped, you can choose a column from the list of supported columns.

You can use Color by to color the data points on the plot with a third feature. You can also use Group by to group the data into separate plots based on a fourth feature.

The following image shows a scatter plot that uses Color by and Group by. In this example, each data point is colored by the MaritalStatus feature, and grouping by the Department feature results in a scatter plot for the data points of each department.

Screenshot of a scatter plot in the Data visualizer view of the Canvas application.

Bar chart

To create a bar chart with your dataset, choose Bar chart in the Visualization panel. Choose the features you want to plot on the x and y-axes from the Columns section. You can drag and drop the columns onto the axes or, once an axis has been dropped, you can choose a column from the list of supported columns.

You can use Group by to group the bar chart by a third feature. You can use Stack by to vertically shade each bar based on the unique values of a fourth feature.

The following image shows a bar chart that uses Group by and Stack by. In this example, the bar chart is grouped by the MaritalStatus feature and stacked by the JobLevel feature. For each JobRole on the x axis, there is a separate bar for the unique categories in the MaritalStatus feature, and every bar is vertically stacked by the JobLevel feature.

Screenshot of a bar chart in the Data visualizer view of the Canvas application.

Box plot

To create a box plot with your dataset, choose Box plot in the Visualization panel. Choose the features you want to plot on the x and y-axes from the Columns section. You can drag and drop the columns onto the axes or, once an axis has been dropped, you can choose a column from the list of supported columns.

You can use Group by to group the box plots by a third feature.

The following image shows a box plot that uses Group by. In this example, the x and y-axes show JobLevel and JobSatisfaction, respectively, and the colored box plots are grouped by the Department feature.

Screenshot of a box plot in the Data visualizer view of the Canvas application.