Tip 77.29 - Watch Out For Correlated Subqueries

Breck Carter
Last modified: February 12, 1998
mail to: bcarter@bcarter.com

Tip 77: Sybase SQL Anywhere
Performance Tips & Techniques

29 - Watch Out For Correlated Subqueries

A correlated subquery is a nested select that refers to a column from the outer select. In some cases they can rank among the poorest performing SQL statements imaginable because the inner result set must be constructed for every single row that is a candidate for inclusion in the outer result set. If both the inner and outer result sets are large the amount of processing required can be huge.

Figure 29A shows two solutions for the same query: "Display the type and quantity for each row, together with the average quantity for that value of type." The first solution involves a correlated subquery that recalculates the average quantity corresponding to each row in the outer result set. The second solution involves the creation of a temporary table to hold average quantities by type, followed by a join to get the final answer.

Figure 29A - Correlated Subquery Versus Temporary Table

CREATE TABLE test1
( primary1 INTEGER NOT NULL,
type1 INTEGER NOT NULL,
quantity1 INTEGER NOT NULL,
PRIMARY KEY ( primary1 ) );
Test 1 - Correlated Subquery...
SELECT primary1,
type1,
quantity1,
( SELECT Avg ( quantity1 )
FROM test1 B
WHERE B.type1 = A.type1 )
FROM test1 A
WHERE A.primary1 <= 10;
Test 1 Results...
SELECT 10 rows: 13.81 seconds
SELECT 50 rows: 65.37 seconds
Test 2 - Create #test1, then SELECT...
SELECT type1,
Avg ( quantity1 ) as avg_quantity1
INTO #test1
FROM test1
GROUP BY type1;
SELECT A.primary1,
A.type1,
A.quantity1,
B.avg_quantity1
FROM test1 A, #test1 B
WHERE A.primary1 <= 10
AND A.type1 = B.type1;
Test 2 Results...
Fill #test1: 2.0 seconds
SELECT 10 rows: 0.6 seconds
Fill #test1: 2.9 seconds
SELECT 50 rows: 0.1 seconds

The correlated subquery solution in Figure 29A takes almost 14 seconds for ten rows and over 65 seconds for fifty rows. By contrast the second solution is not only much faster but it takes the same amount of time for either 10 rows or 50 rows: 3 seconds to both build the temporary table and do the join.

Correlated subqueries represent a challenge when you're trying to optimize a query. First of all, you should be absolutely sure there is a problem before trying to change anything. Run some tests, don't rely on the PLAN to predict a problem. For example SQL Anywhere used exactly the same execution strategy "PLAN> A (test1): B (seq)" for both solutions in Figure 29A but the actual run times were dramatically different.

Another challenge is that correlated subqueries aren't always evil things. Figure 29B shows two solutions to a very common query: "Find all the parent rows for which no child rows exist." The first solution uses a NOT EXISTS operator with a correlated subquery in the outer WHERE clause, whereas the second solution uses an outer join between the parent and child tables and calls IfNull() in the WHERE clause to select only childless parents.

Figure 29B - Correlated Subqueries Aren't Always Bad

CREATE TABLE parent1
( primary1 INTEGER NOT NULL,
data1 INTEGER NOT NULL,
PRIMARY KEY ( primary1 ) );
CREATE TABLE child1
( primary1 INTEGER NOT NULL,
primary2 INTEGER NOT NULL,
data1 INTEGER NOT NULL,
PRIMARY KEY ( primary1, primary2 ),
FOREIGN KEY foreign1 ( primary1 )
REFERENCES parent1 ( primary1 ) );
Test 1: Correlated Subquery...
SELECT COUNT(*)
FROM parent1
WHERE NOT EXISTS
( SELECT *
FROM child1
WHERE parent1.primary1 = child1.primary1 );
Test 1 Results...
PLAN> parent1 (seq): child1 (foreign1)
Estimate: I/O 15470
Actual: I/O 1043, time 4.89 seconds
Test 2: Outer Join and IfNull()...
SELECT COUNT(*)
FROM parent1 LEFT OUTER JOIN child1
ON parent1.primary1 = child1.primary1
WHERE IfNull ( child1.primary1, 1, 0 ) = 1;
Test 2 Results...
PLAN> parent1 (seq): child1 (foreign1)
Estimate: I/O 5055
Actual: I/O 1043, time 5.0 seconds

The two alternatives in Figure 29B require exactly the same number of I/O operations and almost the same amount of time. This indicates that SQL Anywhere may be using the same technique for both even though the I/O estimates are vastly different. The correlated subquery performs just fine and there is no real advantage to the outer join solution.
[Home] [Table of Contents] [Previous Section] [Next Section]

Tip 77: Sybase SQL AnywherePerformance Tips & Techniques

29 - Watch Out For Correlated Subqueries

Tip 77: Sybase SQL Anywhere
Performance Tips & Techniques