Understand Athena EXPLAIN statement results
This topic provides a brief guide to the operational terms used in Athena
EXPLAIN
statement results.
EXPLAIN statement output types
EXPLAIN
statement outputs can be one of two types:
-
Logical plan – Shows the logical plan that the SQL engine uses to execute a statement. The syntax for this option is
EXPLAIN
orEXPLAIN (TYPE LOGICAL)
. -
Distributed plan – Shows an execution plan in a distributed environment. The output shows fragments, which are processing stages. Each plan fragment is processed by one or more nodes. Data can be exchanged between the nodes that process the fragments. The syntax for this option is
EXPLAIN (TYPE DISTRIBUTED)
.In the output for a distributed plan, fragments (processing stages) are indicated by
Fragment
number
[fragment_type
], wherenumber
is a zero-based integer andfragment_type
specifies how the fragment is executed by the nodes. Fragment types, which provide insight into the layout of the data exchange, are described in the following table.Distributed plan fragment types Fragment type Description SINGLE
The fragment is executed on a single node. HASH
The fragment is executed on a fixed number of nodes. The input data is distributed using a hash function. ROUND_ROBIN
The fragment is executed on a fixed number of nodes. The input data is distributed in a round-robin fashion. BROADCAST
The fragment is executed on a fixed number of nodes. The input data is broadcast to all nodes. SOURCE
The fragment is executed on nodes where input splits are accessed.
Exchange
Exchange-related terms describe how data is exchanged between worker nodes. Transfers can be either local or remote.
- LocalExchange [
exchange_type
] -
Transfers data locally within worker nodes for different stages of a query. The value for
exchange_type
can be one of the logical or distributed exchange types as described later in this section. - RemoteExchange [
exchange_type
] -
Transfers data between worker nodes for different stages of a query. The value for
exchange_type
can be one of the logical or distributed exchange types as described later in this section.
Logical Exchange types
The following exchange types describe actions taken during the exchange phase of a logical plan.
-
GATHER
– A single worker node gathers output from all other worker nodes. For example, the last stage of a select query gathers results from all nodes and writes the results to Amazon S3. -
REPARTITION
– Sends the row data to a specific worker based on the partitioning scheme required to apply to the next operator. -
REPLICATE
– Copies the row data to all workers.
Distributed Exchange types
The following exchange types indicate the layout of the data when they are exchanged between nodes in a distributed plan.
-
HASH
– The exchange distributes data to multiple destinations using a hash function. -
SINGLE
– The exchange distributes data to a single destination.
Scanning
The following terms describe how data is scanned during a query.
- TableScan
-
Scans a table's source data from Amazon S3 or an Apache Hive connector and applies partition pruning generated from the filter predicate.
- ScanFilter
-
Scans a table's source data from Amazon S3 or a Hive connector and applies partition pruning generated from the filter predicate and from additional filter predicates not applied through partition pruning.
- ScanFilterProject
-
First, scans a table's source data from Amazon S3 or a Hive connector and applies partition pruning generated from the filter predicate and from additional filter predicates not applied through partition pruning. Then, modifies the memory layout of the output data into a new projection to improve performance of later stages.
Join
Joins data between two tables. Joins can be categorized by join type and by distribution type.
Join types
Join types define the way in which the join operation occurs.
CrossJoin – Produces the Cartesian product of the two tables joined.
InnerJoin – Selects records that have matching values in both tables.
LeftJoin – Selects all records from the left table and the matching records from the right table. If no match occurs, the result on the right side is NULL.
RightJoin – Selects all records from the right table, and the matching records from the left table. If no match occurs, the result on the left side is NULL.
FullJoin – Selects all records where there is a match in the left or right table records. The joined table contains all records from both the tables and fills in NULLs for missing matches on either side.
Note
For performance reasons, the query engine can rewrite a join query into a
different join type to produce the same results. For example, an inner join
query with predicate on one table can be rewritten into a
CrossJoin
. This pushes the predicate down to the scanning phase of
the table so that fewer data are scanned.
Join distribution types
Distribution types define how data is exchanged between worker nodes when the join operation is performed.
Partitioned – Both the left and right table are hash-partitioned across all worker nodes. Partitioned distribution consumes less memory in each node. Partitioned distribution can be much slower than replicated joins. Partitioned joins are suitable when you join two large tables.
Replicated – One table is hash-partitioned across all worker nodes and the other table is replicated to all worker nodes to perform the join operation. Replicated distribution can be much faster than partitioned joins, but it consumes more memory in each worker node. If the replicated table is too large, the worker node can experience an out-of-memory error. Replicated joins are suitable when one of the joined tables is small.