Joining data
You can use the join interface in Amazon QuickSight to join objects from one or more data sources. By using Amazon QuickSight to join the data, you can merge disparate data without duplicating the data from different sources.
Types of joined datasets
A join is performed between two QuickSight logical tables, where each logical table contains information about how to fetch data. When editing a dataset in QuickSight, the join diagram at the top half of the page shows each logical table as a rectangular block.
There are two different types of joined datasets in QuickSight: same-source and cross-source. A dataset is considered same-source when it doesn't have any joins, or when all of the following conditions are met:
-
If any of the logical tables refer to a QuickSight data source:
-
All of the logical tables in this dataset must refer to the same QuickSight data source. This doesn't apply if two separate QuickSight data sources refer to the same underlying database. It must be the exact same QuickSight data source. For more information about using a single data source, see Creating a dataset using an existing data source.
-
-
If any of the logical tables refer to a QuickSight dataset that is a parent dataset:
-
The parent dataset must use direct query.
-
The parent dataset must refer to the same QuickSight data source.
-
If the above conditions aren't met, the dataset is considered a cross-source join.
Facts about joining datasets
Both same-source and cross-source dataset joins have the following limitations.
What's the maximum number of tables a joined dataset can contain?
All joined datasets can contain up to 32 tables.
How large can joined data be?
The maximum allowed size of a join is determined by the query mode and query engine that is used. The list below provides information about the different size limits for the tables to be joined. The size limit applies to all secondary tables combined. There are no join size limits for the primary table.
-
Same-source tables – When tables originate from a single query data source, QuickSight imposes no restrictions on the join size. This does not override join size limitations that the source query engine may have in place.
-
Cross-source datasets – This type of join contains tables from different data sources that aren't stored in SPICE. For these types of joins, QuickSight automatically identifies the largest table in the dataset. The combined size of all other secondary tables must be less than 1 GB.
-
Datasets stored in SPICE – This type of join contains tables that are all ingested into SPICE. The combined size of all secondary tables in this join cannot exceed 20 GB.
For more information about SPICE dataset size calculations, see Estimating the size of SPICE datasets.
Can a joined dataset use direct query?
Same-source datasets support direct query, assuming there are no other restrictions on using direct query. For example, S3 data sources don't support direct query, so a same-source S3 dataset must still use SPICE.
Cross-source datasets must use SPICE.
Can calculated fields be used in a join?
All joined datasets can use calculated fields, but calculated fields can't be used in any on-clauses.
Can geographical data be used in a join?
Same-source datasets support geographical data types, but geographical fields can't be used in any on-clauses.
Cross-source datasets don't support geographical data in any form.
For some examples of joining tables across data sources, see the Joining across data sources on Amazon QuickSight
Creating a join
Use the following procedure to join tables to use in a dataset. Before you begin, import or connect to your data. You can create a join between any of the data sources supported by Amazon QuickSight, except Internet of Things (IoT) data. For example, you can add comma-separated value (.csv) files, tables, views, SQL queries, or JSON objects in an Amazon S3 bucket.
To add one or more joins
-
Open the dataset that you want to work with.
-
(Optional) Before you get started, decide if you want to disable the autogenerated preview based on of a sample of your data. To turn that off, choose Auto-preview at top right. It's turned on by default.
-
If you haven't already chosen a query mode, choose Query mode.
Choose SPICE to store your dataset in SPICE, or choose Direct query to pull live data every time. If your dataset contains one ore more manually uploaded file, your dataset is automatically stored in SPICE.
If you choose SPICE, the data is ingested into QuickSight. Visuals that use the dataset run queries in SPICE, instead of on the database.
If you choose Direct query, the data isn't ingested into SPICE. Visuals that use the dataset run queries on the database, instead of in SPICE.
If you choose Query mode, make sure to set unique keys in the join, if applicable, to improve performance when loading visuals.
-
On the data preparation page, choose Add data.
-
In the Add data page that opens, choose one of the following options and complete the steps following:
-
Add data from a dataset:
-
Choose Dataset.
-
Select a dataset from the list.
-
Choose Select.
-
-
Add data from a data source:
-
Choose Data source.
-
Select a data source from the list.
-
Choose Select.
-
Select a table from the list.
-
Choose Select.
-
-
Create self-joins by adding a table multiple times. A counter appears after the name. An example is Product, Product (2), and Product (3). Field names in the Fields or Filters sections include the same counter so you can know which instance of the table a field came from.
-
Add a new file by choosing Upload a file, and then choose the file that you want to join.
-
-
(Optional) Choose Use custom SQL to open the query editor and write a query for a SQL data source.
-
(Optional) After you add data, interact with each table by choosing its menu icon. Rearrange the tables by dragging and dropping them.
An icon with red dots appears to indicate that you need to configure this join. Two red dots appear for joins that aren't yet configured. To create joins, choose the first join configuration icon.
-
(Optional) To change an existing join, reopen Join configuration by choosing the join icon between two tables.
The Join Configuration pane opens. On the join interface, specify the join type and the fields to use to join the tables.
-
At the bottom of the screen, you can see options to set a field in one table equal to a field in another table.
-
In the Join clauses section, choose the join column for each table.
(Optional) If the tables that you selected join on multiple columns, choose Add a new join clause. Doing this adds another row to the join clauses, so you can specify the next set of columns to join. Repeat this process until you have identified all of the join columns for the two data objects.
-
-
In the Join configuration pane, choose the kind of join to apply. If the join fields are a unique key for one or both tables, enable the unique key setting. Unique keys only apply to direct queries, not to SPICE data.
For more information about joins, see Join types.
-
Choose Apply to confirm your choices.
To cancel without making changes, choose Cancel.
-
The join icon in the workspace changes to show the new relationship.
-
(Optional) In the Fields section, you can use each field's menu to do one or more of the following:
-
Add a hierarchy to a geospatial field.
-
Include or Exclude the field.
-
Edit name & description of the field.
-
Change data type.
-
Add a calculation (a calculated field).
-
Restrict access to only me, so only you can see it. This can be helpful when you are adding fields to a dataset that's already in use.
-
-
(Optional) In the Filters section, you can add or edit filters. For more information, see Filtering data in Amazon QuickSight.
Join types
Amazon QuickSight supports the following join types:
-
Inner joins
-
Left and right outer joins
-
Full outer joins
Let's take a closer look at what these join types do with your data. For our
example data, we're using the following tables named widget
and
safety rating
.
SELECT * FROM safety-rating rating_id safety_rating 1 A+ 2 A 3 A- 4 B+ 5 B SELECT * FROM WIDGET widget_id widget safety_rating_id 1 WidgetA 3 2 WidgetB 1 3 WidgetC 1 4 WidgetD 2 5 WidgetE 6 WidgetF 5 7 WidgetG
Inner joins
Use an inner join ( ) when you want to see only the data where there is a match between two tables. For example, suppose that you perform an inner join on the safety-rating and widget tables.
In the following result set, widgets without safety ratings are removed, and safety ratings without associated widgets are removed. Only the rows that match perfectly are included.
SELECT * FROM safety-rating INNER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 3 A- 1 WidgetA 3 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 5 B 6 WidgetF 5
Left and right outer joins
These are also known as left or right outer joins. Use a left ( ) or right ( ) outer join when you want to see all the data from one table, and only the matching rows from the other table.
In a graphical interface, you can see which table is on the right or the left. In a SQL statement, the first table is considered to be on the left. Therefore, choosing a left outer join as opposed to a right outer join depends only on how the tables are laid out in your query tool.
For example, suppose that you perform a left outer join (
) on safety-rating
(the left table) and
widgets
(the right table). In this case, all
safety-rating
rows are returned, and only matching
widget
rows are returned. You can see blanks in the result set
where there is no matching data.
SELECT * FROM safety-rating LEFT OUTER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 3 A- 1 WidgetA 3 4 B+ 5 B 6 WidgetF 5
If you instead use a right outer join (
), call the tables in the same order so
safety-rating
is on the left and widgets
is on the
right. In this case, only matching safety-rating
rows are returned,
and all widget
rows are returned. You can see blanks in the result
set where there is no matching data.
SELECT * FROM safety-rating RIGHT OUTER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 3 A- 1 WidgetA 3 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 5 WidgetE 5 B 6 WidgetF 5 7 WidgetG
Full outer joins
These are sometimes called just outer joins, but this term can refer to either a left outer, right outer, or full outer join. To define the meaning, we use the complete name: full outer join.
Use a full outer join (
) to see data that matches, plus data from both tables that
doesn't match. This type of join includes all rows from both tables. For
example, if you perform a full outer join on the safety-rating
and
widget
tables, all rows are returned. The rows are aligned
where they matched, and all extra data is included on separate rows. You can see
blanks in the result set where there is no matching data.
SELECT * FROM safety-rating FULL OUTER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 3 A- 1 WidgetA 3 4 B+ 5 B 6 WidgetF 5 5 WidgetE 7 WidgetG