Query Execution Steps
----------------------
Step Each step is an individual operation in the explain plan. Steps can be combined to allow compute nodes to perform a query, join, or other database operation.
Segment Some number of steps that can be done by a single process. A segment is also a single
compilation unit executable by compute nodes. Each segment begins with a SCAN of table data and
ends either with a materialization step or some other network activity.
Stream A collection of segments that always begins with a SCAN of some data set and ends with a
materialization or blocking step. Materialization or blocking steps include HASH, AGG, SORT, and SAVE.
The last segment of a query returns the data. If the return set is aggregated or sorted, the compute nodes
each send their piece of the intermediate result to the leader node, which then merges the data so the
final result can be sent back to the requesting client.
To view the distribution style of a table, query the PG_CLASS system catalog table. The RELDISTSYLE
column indicates the distribution style for the table.
DS_DIST_NONE
No redistribution is required, because corresponding slices are collocated on the compute nodes.
You will typically have only one DS_DIST_NONE step, the join between the fact table and one
dimension table.
DS_DIST_ALL_NONE
No redistribution is required, because the inner join table used DISTSTYLE ALL. The entire table is
located on every node.
API Version 2012-12-01
104
Amazon Redshift Database Developer Guide
Evaluating the query plan
DS_DIST_INNER
The inner table is redistributed.
DS_BCAST_INNER
A copy of the entire inner table is broadcast to all the compute nodes.
DS_DIST_ALL_INNER
The entire inner table is redistributed to a single slice because the outer table uses DISTSTYLE ALL.
DS_DIST_BOTH
Both tables are redistributed.
Broadcasting and redistribution can be expensive steps in terms of query performance.
Uneven distribution, or data distribution skew, forces some nodes to do more work than others, which
limits query performance.
To check for distribution skew, query the SVV_DISKUSAGE system view.
num_values column gives the number of rows in that disk block, so sum(num_values) returns the number of rows on each slice.
DS_DIST_NONE and DS_DIST_ALL_NONE are good. They indicate that no distribution was required
for that step because all of the joins are collocated.
DS_DIST_INNER means that the step will probably have a relatively high cost because the inner table
is being redistributed to the nodes. DS_DIST_INNER indicates that the outer table is already properly
distributed on the join key. Set the inner table's distribution key to the join key to convert this to
DS_DIST_NONE. If distributing the inner table on the join key is not possible because the outer table is
not distributed on the join key, evaluate whether to use ALL distribution for the inner table. If the table is
relatively slow moving, that is, it is not updated frequently or extensively, and it is large enough to carry
a high redistribution cost, change the distribution style to ALL and test again. ALL distribution causes
increased load times, so when you retest, include the load time in your evaluation factors.
DS_DIST_ALL_INNER is not good. It means the entire inner table is redistributed to a single slice because
the outer table uses DISTSTYLE ALL, so that a copy of the entire outer table is located on each node.
This results in inefficient serial execution of the join on a single node instead taking advantage of parallel
execution using all of the nodes. DISTSTYLE ALL is meant to be used only for the inner join table. Instead,
specify a distribution key or use even distribution for the outer table.
DS_BCAST_INNER and DS_DIST_BOTH are not good. Usually these redistributions occur because the tables are not joined on their distribution keys.
If the fact table does not already have a distribution key,
specify the joining column as the distribution key for both tables. If the fact table already has a distribution
key on another column, you should evaluate whether changing the distribution key to collocate this join
will improve overall performance. If changing the distribution key of the outer table is not an optimal choice,
you can achieve collocation by specifying DISTSTYLE ALL for the inner table.
Query writing intermediate to disk
If IS_DISKBASED is true ("t") for any step, then that step wrote data to disk.
With a large data set, hashes, aggregates, and sorts are the relational operators that would be likely to write data to disk if the system does not have enough memory allocated for query processing.
STV_PARTITIONS table to determine disk space.
STV_BLOCKLIST - contains information about the number of blocks allocated to each table
STV_TBL_PERM - contains the table IDs for all of the permanent tables.
Compiled Code
The compiled code segments are stored in a least recently used (LRU) cache and shared across sessions in a cluster, so subsequent executions of the same query, even in different sessions and often even with different query parameters, will run faster because they can skip the initial generation and compilation steps.The LRU cache persists through cluster reboots, but is wiped by maintenance upgrades.
The execution engine generates different code for the JDBC connection protocols and for the ODBC and psql (libpq) connection protocols.
To see the effect of generating compiled code for a query, you can query the SVL_COMPILE system table.
Setting the JDBC fetch size parameter
For the best performance, set the fetch size to the highest value that does not lead to out of memory
errors. A lower fetch size value results in more server trips, which prolongs execution times. The server reserves resources, including the WLM query slot and associated memory, until the client retrieves the entire result set or the query is canceled. When you tune the fetch size appropriately, those resources are released more quickly, making them available to other queries
----------------------
Step Each step is an individual operation in the explain plan. Steps can be combined to allow compute nodes to perform a query, join, or other database operation.
Segment Some number of steps that can be done by a single process. A segment is also a single
compilation unit executable by compute nodes. Each segment begins with a SCAN of table data and
ends either with a materialization step or some other network activity.
Stream A collection of segments that always begins with a SCAN of some data set and ends with a
materialization or blocking step. Materialization or blocking steps include HASH, AGG, SORT, and SAVE.
The last segment of a query returns the data. If the return set is aggregated or sorted, the compute nodes
each send their piece of the intermediate result to the leader node, which then merges the data so the
final result can be sent back to the requesting client.
To view the distribution style of a table, query the PG_CLASS system catalog table. The RELDISTSYLE
column indicates the distribution style for the table.
DS_DIST_NONE
No redistribution is required, because corresponding slices are collocated on the compute nodes.
You will typically have only one DS_DIST_NONE step, the join between the fact table and one
dimension table.
DS_DIST_ALL_NONE
No redistribution is required, because the inner join table used DISTSTYLE ALL. The entire table is
located on every node.
API Version 2012-12-01
104
Amazon Redshift Database Developer Guide
Evaluating the query plan
DS_DIST_INNER
The inner table is redistributed.
DS_BCAST_INNER
A copy of the entire inner table is broadcast to all the compute nodes.
DS_DIST_ALL_INNER
The entire inner table is redistributed to a single slice because the outer table uses DISTSTYLE ALL.
DS_DIST_BOTH
Both tables are redistributed.
Broadcasting and redistribution can be expensive steps in terms of query performance.
Uneven distribution, or data distribution skew, forces some nodes to do more work than others, which
limits query performance.
To check for distribution skew, query the SVV_DISKUSAGE system view.
num_values column gives the number of rows in that disk block, so sum(num_values) returns the number of rows on each slice.
DS_DIST_NONE and DS_DIST_ALL_NONE are good. They indicate that no distribution was required
for that step because all of the joins are collocated.
DS_DIST_INNER means that the step will probably have a relatively high cost because the inner table
is being redistributed to the nodes. DS_DIST_INNER indicates that the outer table is already properly
distributed on the join key. Set the inner table's distribution key to the join key to convert this to
DS_DIST_NONE. If distributing the inner table on the join key is not possible because the outer table is
not distributed on the join key, evaluate whether to use ALL distribution for the inner table. If the table is
relatively slow moving, that is, it is not updated frequently or extensively, and it is large enough to carry
a high redistribution cost, change the distribution style to ALL and test again. ALL distribution causes
increased load times, so when you retest, include the load time in your evaluation factors.
DS_DIST_ALL_INNER is not good. It means the entire inner table is redistributed to a single slice because
the outer table uses DISTSTYLE ALL, so that a copy of the entire outer table is located on each node.
This results in inefficient serial execution of the join on a single node instead taking advantage of parallel
execution using all of the nodes. DISTSTYLE ALL is meant to be used only for the inner join table. Instead,
specify a distribution key or use even distribution for the outer table.
DS_BCAST_INNER and DS_DIST_BOTH are not good. Usually these redistributions occur because the tables are not joined on their distribution keys.
If the fact table does not already have a distribution key,
specify the joining column as the distribution key for both tables. If the fact table already has a distribution
key on another column, you should evaluate whether changing the distribution key to collocate this join
will improve overall performance. If changing the distribution key of the outer table is not an optimal choice,
you can achieve collocation by specifying DISTSTYLE ALL for the inner table.
Query writing intermediate to disk
If IS_DISKBASED is true ("t") for any step, then that step wrote data to disk.
With a large data set, hashes, aggregates, and sorts are the relational operators that would be likely to write data to disk if the system does not have enough memory allocated for query processing.
STV_PARTITIONS table to determine disk space.
STV_BLOCKLIST - contains information about the number of blocks allocated to each table
STV_TBL_PERM - contains the table IDs for all of the permanent tables.
Compiled Code
The compiled code segments are stored in a least recently used (LRU) cache and shared across sessions in a cluster, so subsequent executions of the same query, even in different sessions and often even with different query parameters, will run faster because they can skip the initial generation and compilation steps.The LRU cache persists through cluster reboots, but is wiped by maintenance upgrades.
The execution engine generates different code for the JDBC connection protocols and for the ODBC and psql (libpq) connection protocols.
To see the effect of generating compiled code for a query, you can query the SVL_COMPILE system table.
Setting the JDBC fetch size parameter
For the best performance, set the fetch size to the highest value that does not lead to out of memory
errors. A lower fetch size value results in more server trips, which prolongs execution times. The server reserves resources, including the WLM query slot and associated memory, until the client retrieves the entire result set or the query is canceled. When you tune the fetch size appropriately, those resources are released more quickly, making them available to other queries
Comments