Explore your data using visualization techniques
Note
You can only use SageMaker Canvas visualizations for models built on tabular datasets. Multi-category text prediction models are also excluded.
With Amazon SageMaker Canvas, you can explore and visualize your data to gain advanced insights into your data before building your ML models. You can visualize using scatter plots, bar charts, and box plots, which can help you understand your data and discover the relationships between features that could affect the model accuracy.
In the Build tab of the SageMaker Canvas application, choose Data visualizer to begin creating your visualizations.
You can change the visualization sample size to adjust the size of the random sample taken from your dataset. A sample size that is too large might affect the performance of your data visualizations, so we recommend that you choose an appropriate sample size. To change the sample size, use the following procedure.
-
Choose Visualization sample.
-
Use the slider to select your desired sample size.
-
Choose Update to confirm the change to your sample size.
Note
Certain visualization techniques require columns of a specific data type. For example, you can only use numeric columns for the x and y-axes of scatter plots.
Scatter plot
To create a scatter plot with your dataset, choose Scatter plot in the Visualization panel. Choose the features you want to plot on the x and y-axes from the Columns section. You can drag and drop the columns onto the axes or, once an axis has been dropped, you can choose a column from the list of supported columns.
You can use Color by to color the data points on the plot with a third feature. You can also use Group by to group the data into separate plots based on a fourth feature.
The following image shows a scatter plot that uses Color by and
Group by. In this example, each data point is colored by the
MaritalStatus
feature, and grouping by the Department
feature
results in a scatter plot for the data points of each department.
Bar chart
To create a bar chart with your dataset, choose Bar chart in the Visualization panel. Choose the features you want to plot on the x and y-axes from the Columns section. You can drag and drop the columns onto the axes or, once an axis has been dropped, you can choose a column from the list of supported columns.
You can use Group by to group the bar chart by a third feature. You can use Stack by to vertically shade each bar based on the unique values of a fourth feature.
The following image shows a bar chart that uses Group by and
Stack by. In this example, the bar chart is grouped by the
MaritalStatus
feature and stacked by the JobLevel
feature. For each
JobRole
on the x axis, there is a separate bar for the unique categories in the
MaritalStatus
feature, and every bar is vertically stacked by the
JobLevel
feature.
Box plot
To create a box plot with your dataset, choose Box plot in the Visualization panel. Choose the features you want to plot on the x and y-axes from the Columns section. You can drag and drop the columns onto the axes or, once an axis has been dropped, you can choose a column from the list of supported columns.
You can use Group by to group the box plots by a third feature.
The following image shows a box plot that uses Group by. In this
example, the x and y-axes show JobLevel
and JobSatisfaction
,
respectively, and the colored box plots are grouped by the Department
feature.