Creating and interpreting a query plan
You can use the query plan to get information on the individual operations required to run a query. Before you work with a query plan, we recommend that you first understand how Amazon Redshift handles processing queries and creating query plans. For more information, see Query planning and execution workflow.
To create a query plan, run the EXPLAIN command followed by the actual query text. The query plan gives you the following information:
-
What operations the execution engine performs, reading the results from bottom to top.
-
What type of step each operation performs.
-
Which tables and columns are used in each operation.
-
How much data is processed in each operation, in terms of number of rows and data width in bytes.
-
The relative cost of the operation. Cost is a measure that compares the relative execution times of the steps within a plan. Cost does not provide any precise information about actual execution times or memory consumption, nor does it provide a meaningful comparison between execution plans. It does give you an indication of which operations in a query are consuming the most resources.
The EXPLAIN command doesn't actually run the query. It only shows the plan that Amazon Redshift runs if the query is run under current operating conditions. If you change the schema or data for a table and run ANALYZE again to update the statistical metadata, the query plan might be different.
The query plan output by EXPLAIN is a simplified, high-level view of query execution. It doesn't illustrate the details of parallel query processing. To see detailed information, run the query itself, and then get query summary information from the SVL_QUERY_SUMMARY or SVL_QUERY_REPORT view. For more information about using these views, see Analyzing the query summary.
The following example shows the EXPLAIN output for a simple GROUP BY query on the EVENT table:
explain select eventname, count(*) from event group by eventname; QUERY PLAN ------------------------------------------------------------------- XN HashAggregate (cost=131.97..133.41 rows=576 width=17) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=17)
EXPLAIN returns the following metrics for each operation:
- Cost
-
A relative value that is useful for comparing operations within a plan. Cost consists of two decimal values separated by two periods, for example
cost=131.97..133.41
. The first value, in this case 131.97, provides the relative cost of returning the first row for this operation. The second value, in this case 133.41, provides the relative cost of completing the operation. The costs in the query plan are cumulative as you read up the plan, so the HashAggregate cost in this example (131.97..133.41) includes the cost of the Seq Scan below it (0.00..87.98). - Rows
-
The estimated number of rows to return. In this example, the scan is expected to return 8798 rows. The HashAggregate operator on its own is expected to return 576 rows (after duplicate event names are discarded from the result set).
Note
The rows estimate is based on the available statistics generated by the ANALYZE command. If ANALYZE has not been run recently, the estimate is less reliable.
- Width
-
The estimated width of the average row, in bytes. In this example, the average row is expected to be 17 bytes wide.
EXPLAIN operators
This section briefly describes the operators that you see most often in the EXPLAIN output. For a complete list of operators, see EXPLAIN in the SQL Commands section.
Sequential scan operator
The sequential scan operator (Seq Scan) indicates a table scan. Seq Scan scans each column in the table sequentially from beginning to end and evaluates query constraints (in the WHERE clause) for every row.
Join operators
Amazon Redshift selects join operators based on the physical design of the tables being joined, the location of the data required for the join, and the specific requirements of the query itself.
-
Nested Loop
The least optimal join, a nested loop is used mainly for cross-joins (Cartesian products) and some inequality joins.
-
Hash Join and Hash
Typically faster than a nested loop join, a hash join and hash are used for inner joins and left and right outer joins. These operators are used when joining tables where the join columns are not both distribution keys and sort keys. The hash operator creates the hash table for the inner table in the join; the hash join operator reads the outer table, hashes the joining column, and finds matches in the inner hash table.
-
Merge Join
Typically the fastest join, a merge join is used for inner joins and outer joins. The merge join is not used for full joins. This operator is used when joining tables where the join columns are both distribution keys and sort keys, and when less than 20 percent of the joining tables are unsorted. It reads two sorted tables in order and finds the matching rows. To view the percent of unsorted rows, query the SVV_TABLE_INFO system table.
-
Spatial Join
Typically a fast join based on proximity of spatial data, used for
GEOMETRY
andGEOGRAPHY
data types.
Aggregate operators
The query plan uses the following operators in queries that involve aggregate functions and GROUP BY operations.
-
Aggregate
Operator for scalar aggregate functions such as AVG and SUM.
-
HashAggregate
Operator for unsorted grouped aggregate functions.
-
GroupAggregate
Operator for sorted grouped aggregate functions.
Sort operators
The query plan uses the following operators when queries have to sort or merge result sets.
-
Sort
Evaluates the ORDER BY clause and other sort operations, such as sorts required by UNION queries and joins, SELECT DISTINCT queries, and window functions.
-
Merge
Produces final sorted results according to intermediate sorted results that derive from parallel operations.
UNION, INTERSECT, and EXCEPT operators
The query plan uses the following operators for queries that involve set operations with UNION, INTERSECT, and EXCEPT.
-
Subquery
Used to run UNION queries.
-
Hash Intersect Distinct
Used to run INTERSECT queries.
-
SetOp Except
Used to run EXCEPT (or MINUS) queries.
Other operators
The following operators also appear frequently in EXPLAIN output for routine queries.
-
Unique
Removes duplicates for SELECT DISTINCT queries and UNION queries.
-
Limit
Processes the LIMIT clause.
-
Window
Runs window functions.
-
Result
Runs scalar functions that do not involve any table access.
-
Subplan
Used for certain subqueries.
-
Network
Sends intermediate results to the leader node for further processing.
-
Materialize
Saves rows for input to nested loop joins and some merge joins.
Joins in EXPLAIN
The query optimizer uses different join types to retrieve table data, depending on the structure of the query and the underlying tables. The EXPLAIN output references the join type, the tables used, and the way the table data is distributed across the cluster to describe how the query is processed.
Join type examples
The following examples show the different join types that the query optimizer can use. The join type used in the query plan depends on the physical design of the tables involved.
Example: Hash join two tables
The following query joins EVENT and CATEGORY on the CATID column. CATID is the distribution and sort key for CATEGORY but not for EVENT. A hash join is performed with EVENT as the outer table and CATEGORY as the inner table. Because CATEGORY is the smaller table, the planner broadcasts a copy of it to the compute nodes during query processing by using DS_BCAST_INNER. The join cost in this example accounts for most of the cumulative cost of the plan.
explain select * from category, event where category.catid=event.catid; QUERY PLAN ------------------------------------------------------------------------- XN Hash Join DS_BCAST_INNER (cost=0.14..6600286.07 rows=8798 width=84) Hash Cond: ("outer".catid = "inner".catid) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=35) -> XN Hash (cost=0.11..0.11 rows=11 width=49) -> XN Seq Scan on category (cost=0.00..0.11 rows=11 width=49)
Note
Aligned indents for operators in the EXPLAIN output sometimes indicate that those operations do not depend on each other and can start in parallel. In the preceding example, although the scan on the EVENT table and the hash operation are aligned, the EVENT scan must wait until the hash operation has fully completed.
Example: Merge join two tables
The following query also uses SELECT *, but it joins SALES and LISTING on the LISTID column, where LISTID has been set as both the distribution and sort key for both tables. A merge join is chosen, and no redistribution of data is required for the join (DS_DIST_NONE).
explain select * from sales, listing where sales.listid = listing.listid; QUERY PLAN ----------------------------------------------------------------------------- XN Merge Join DS_DIST_NONE (cost=0.00..6285.93 rows=172456 width=97) Merge Cond: ("outer".listid = "inner".listid) -> XN Seq Scan on listing (cost=0.00..1924.97 rows=192497 width=44) -> XN Seq Scan on sales (cost=0.00..1724.56 rows=172456 width=53)
The following example demonstrates the different types of joins within the same query. As in the previous example, SALES and LISTING are merge joined, but the third table, EVENT, must be hash joined with the results of the merge join. Again, the hash join incurs a broadcast cost.
explain select * from sales, listing, event where sales.listid = listing.listid and sales.eventid = event.eventid; QUERY PLAN ---------------------------------------------------------------------------- XN Hash Join DS_BCAST_INNER (cost=109.98..3871130276.17 rows=172456 width=132) Hash Cond: ("outer".eventid = "inner".eventid) -> XN Merge Join DS_DIST_NONE (cost=0.00..6285.93 rows=172456 width=97) Merge Cond: ("outer".listid = "inner".listid) -> XN Seq Scan on listing (cost=0.00..1924.97 rows=192497 width=44) -> XN Seq Scan on sales (cost=0.00..1724.56 rows=172456 width=53) -> XN Hash (cost=87.98..87.98 rows=8798 width=35) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=35)
Example: Join, aggregate, and sort
The following query runs a hash join of the SALES and EVENT tables, followed by aggregation and sort operations to account for the grouped SUM function and the ORDER BY clause. The initial sort operator runs in parallel on the compute nodes. Then the Network operator sends the results to the leader node, where the Merge operator produces the final sorted results.
explain select eventname, sum(pricepaid) from sales, event where sales.eventid=event.eventid group by eventname order by 2 desc; QUERY PLAN --------------------------------------------------------------------------------- XN Merge (cost=1002815366604.92..1002815366606.36 rows=576 width=27) Merge Key: sum(sales.pricepaid) -> XN Network (cost=1002815366604.92..1002815366606.36 rows=576 width=27) Send to leader -> XN Sort (cost=1002815366604.92..1002815366606.36 rows=576 width=27) Sort Key: sum(sales.pricepaid) -> XN HashAggregate (cost=2815366577.07..2815366578.51 rows=576 width=27) -> XN Hash Join DS_BCAST_INNER (cost=109.98..2815365714.80 rows=172456 width=27) Hash Cond: ("outer".eventid = "inner".eventid) -> XN Seq Scan on sales (cost=0.00..1724.56 rows=172456 width=14) -> XN Hash (cost=87.98..87.98 rows=8798 width=21) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=21)
Data redistribution
The EXPLAIN output for joins also specifies a method for how data is moved around a cluster to facilitate the join. This data movement can be either a broadcast or a redistribution. In a broadcast, the data values from one side of a join are copied from each compute node to every other compute node, so that every compute node ends up with a complete copy of the data. In a redistribution, participating data values are sent from their current slice to a new slice (possibly on a different node). Data is typically redistributed to match the distribution key of the other table participating in the join if that distribution key is one of the joining columns. If neither of the tables has distribution keys on one of the joining columns, either both tables are distributed or the inner table is broadcast to every node.
The EXPLAIN output also references inner and outer tables. The inner table is scanned first, and appears nearer the bottom of the query plan. The inner table is the table that is probed for matches. It is usually held in memory, is usually the source table for hashing, and if possible, is the smaller table of the two being joined. The outer table is the source of rows to match against the inner table. It is usually read from disk. The query optimizer chooses the inner and outer table based on database statistics from the latest run of the ANALYZE command. The order of tables in the FROM clause of a query doesn't determine which table is inner and which is outer.
Use the following attributes in query plans to identify how data is moved to facilitate a query:
-
DS_BCAST_INNER
A copy of the entire inner table is broadcast to all compute nodes.
-
DS_DIST_ALL_NONE
No redistribution is required, because the inner table has already been distributed to every node using DISTSTYLE ALL.
-
DS_DIST_NONE
No tables are redistributed. Collocated joins are possible because corresponding slices are joined without moving data between nodes.
-
DS_DIST_INNER
The inner table is redistributed.
-
DS_DIST_OUTER
The outer table is redistributed.
-
DS_DIST_ALL_INNER
The entire inner table is redistributed to a single slice because the outer table uses DISTSTYLE ALL.
-
DS_DIST_BOTH
Both tables are redistributed.