<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SQL in the Wild &#187; Performance</title>
	<atom:link href="http://sqlinthewild.co.za/index.php/category/sql-server/performance/feed/" rel="self" type="application/rss+xml" />
	<link>http://sqlinthewild.co.za</link>
	<description>A discussion on SQL Server</description>
	<lastBuildDate>Wed, 25 Apr 2012 14:45:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Converting OR to Union</title>
		<link>http://sqlinthewild.co.za/index.php/2011/07/05/converting-or-to-union/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/07/05/converting-or-to-union/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 15:30:00 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=963</guid>
		<description><![CDATA[When I looked at indexing for queries containing predicates combined with OR, it became clear that the are some restrictive requirements for indexes for the optimiser to consider using the indexes for seek operations. Each predicate (or set of predicates) combined with an OR must have a separate index All of those indexes must be [...]]]></description>
			<content:encoded><![CDATA[<p>When I looked at <a href="http://sqlinthewild.co.za/index.php/2011/05/03/indexing-for-ors/">indexing for queries containing predicates combined with OR</a>, it became clear that the are some restrictive requirements for indexes for the optimiser to consider using the indexes for seek operations.</p>
<ul>
<li>Each predicate (or set of predicates) combined with an OR must have a separate index</li>
<li> All of those indexes must be covering, or the row count of the concatenated result set low enough to make key lookups an option, as the optimiser does not apparent to consider the possibility of doing key lookups for a subset of the predicates before concatenating the result sets.</li>
</ul>
<p>So what can be done if it&#8217;s not possible to meet those requirements?</p>
<p>The standard trick is to convert the query with ORs into multiple queries combined with UNION. The idea is that since OR predicates are evaluated separately and the result sets concatenated, we can do that manually by writing the queries separately and concatenating them using UNION or UNION ALL. (UNION ALL can only be safely used if the predicates are known to be mutually exclusive)</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE Persons (
PersonID INT IDENTITY PRIMARY KEY,
FirstName    VARCHAR(30),
Surname VARCHAR(30),
Country CHAR(3),
RegistrationDate DATE
)

CREATE INDEX idx_Persons_FirstName ON dbo.Persons (FirstName) INCLUDE (Surname)
CREATE INDEX idx_Persons_Surname ON dbo.Persons (Surname) INCLUDE (FirstName)
GO

-- Data population using SQLDataGenerator

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Daniel' OR Surname = 'Barnes'

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Daniel'
UNION
SELECT FirstName, Surname
FROM dbo.Persons
WHERE Surname = 'Barnes'</pre>
<p>In this case, the OR can be replaced with a UNION and the results are the same. The Union form is slightly less efficient according to the execution plan&#8217;s costings (60% compared to the OR at 40%), and the two queries have the same general form, with two index seeks and some form of concatenation and remove duplicates.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrResult1.png"><img style="display: inline; border-width: 0px;" title="OrResult1" src="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrResult1_thumb.png" border="0" alt="OrResult1" width="124" height="320" /></a><br />
<a href="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrExecPlan1.png"><img style="display: inline; border-width: 0px;" title="OrExecPlan1" src="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrExecPlan1_thumb.png" border="0" alt="OrExecPlan1" width="484" height="298" /></a></p>
<p>So in that case it worked fine, although the original form was a little more efficient<br />
<span id="more-963"></span><br />
Some care does need to be taken however, as the query with OR and the query with UNION may not always be equivalent, and it has to do with the elimination of duplicate rows.</p>
<p>In an OR, if a row qualifies for both of the predicates, it&#8217;s only returned once. That should be obvious, it&#8217;s how things should work, we don&#8217;t want to see the row multiple times just because it qualifies for more than one of the OR predicates. If we change that to UNION ALL then the row will be returned twice, it appears in both queries that are concatenated, and UNION ALL means combine without eliminating duplicates.</p>
<pre class="brush: sql; title: ; notranslate">SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Herman' OR Surname = 'Anderson'

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Herman'
UNION ALL
SELECT FirstName, Surname
FROM dbo.Persons
WHERE Surname = 'Anderson'</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrResult2a.png"><img style="display: inline; border-width: 0px;" title="OrResult2a" src="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrResult2a_thumb.png" border="0" alt="OrResult2a" width="124" height="292" /></a></p>
<p>In that example, Herman Anderson appears once in the results of the OR query and twice in the results of the UNION ALL. That&#8217;s because it qualifies for both predicates. The OR eliminated the duplication, the UNION ALL does not.</p>
<p>So change that UNION ALL to UNION so that the elimination of duplicate rows is done, the row appears only once and life is good again. Or is it?</p>
<pre class="brush: sql; title: ; notranslate">SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Alfred' OR Surname = 'Hickman'

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Alfred'
UNION
SELECT FirstName, Surname
FROM dbo.Persons
WHERE Surname = 'Hickman'
ORDER BY FirstName, Surname</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrResult2b.png"><img style="display: inline; border-width: 0px;" title="OrResult2b" src="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrResult2b_thumb.png" border="0" alt="OrResult2b" width="124" height="307" /></a></p>
<p>This time, Alfred Hickman appears twice in the results from the OR, but only once in the output from the UNION</p>
<p>The difference comes in how the duplicates are eliminated. With an OR, SQL does the elimination of duplicates based on the key value regardless of what may be in the select list. With an UNION, SQL does the elimination of duplicates based on the select list, regardless of what the key value may be and in the above example there were two rows in the table with the value ‘Alfred Hickman’. So with UNION you can lose rows if they are duplicated in the table.</p>
<p>The solution&#8217;s fairly simple, if converting an OR into a UNION, ensure that the key column(s) are in the select list, then the duplicate elimination done by the UNION will only remove rows that were part of both result sets, instead of also removing ones that really do appear twice in the table.</p>
<p>So in conclusion, if you&#8217;re replacing a query using OR with a query using UNION, be careful with the finer details around duplicates. If you know the conditions are mutually exclusive, use UNION ALL. If you don&#8217;t, use UNION and ensure that the table&#8217;s key column(s) are present in the select list so that the UNION doesn&#8217;t remove rows that you don&#8217;t want it to remove.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/07/05/converting-or-to-union/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>To TOP or not to TOP an EXISTS</title>
		<link>http://sqlinthewild.co.za/index.php/2011/04/05/to-top-or-not-to-top-an-exists/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/04/05/to-top-or-not-to-top-an-exists/#comments</comments>
		<pubDate>Tue, 05 Apr 2011 16:30:00 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=988</guid>
		<description><![CDATA[Earlier this year I had a look at a query pattern that I often see on forums and in production code, that of the Distinct within an IN subquery. Today I&#8217;m going to look at a similar patters, that being the use of TOP 1 within an EXISTS subquery. Three tests. First a straightforward exists [...]]]></description>
			<content:encoded><![CDATA[<p>Earlier this year I had a look at a query pattern that I often see on forums and in production code, that of the Distinct within an IN subquery. Today I&#8217;m going to look at a similar patters, that being the use of TOP 1 within an EXISTS subquery.</p>
<p>Three tests. First a straightforward exists with no correlation (no where clause linking it to an outer query). Second, an exists with a complex query (one with a non-sargable where clause and a group by and having). Third an exists subquery correlated to the outer query.</p>
<p>Table structures are nice and simple, in fact, for ease I&#8217;m going to use the same tables as I did back on the exists, in and inner join tests. Code to create and populate the tables it attached to the end of the post.</p>
<p>First up, a simple exists query, in an IF, just to be different.</p>
<pre class="brush: sql; title: ; notranslate">IF EXISTS (SELECT 1 FROM PrimaryTable_Medium)
PRINT 'Exists'

IF EXISTS (SELECT TOP (1) 1 FROM PrimaryTable_Medium)
PRINT 'Exists too'</pre>
<p>For a benchmark, a SELECT 1 FROM PrimaryTable_Medium has the following IO characteristics</p>
<blockquote><p>Table &#8216;PrimaryTable_Medium&#8217;. Scan count 1, logical reads 89, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 15 ms,  elapsed time = 510 ms.</p></blockquote>
<p>Ignore the elapsed time, that&#8217;s likely mostly from displaying the records. I&#8217;m going to focus mostly on the CPU and IO.</p>
<p>Execution plans of the two exists variations are absolutely identical.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/03/TopExists1.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="TopExists1" src="http://sqlinthewild.co.za/wp-content/uploads/2011/03/TopExists1_thumb.png" border="0" alt="TopExists1" width="484" height="270" /></a></p>
<p>The index operators are scans because there is no way they could be anything else, there&#8217;s no predicate so a seek is not possible. That said, it&#8217;s not a full index scan. The properties of the Index Scan show 1 row only (actual and estimated). So SQL did not read the entire index, just enough to evaluate the EXISTS, and that&#8217;s what it did in both cases. IO stats confirm that.</p>
<p><span id="more-988"></span></p>
<blockquote><p>Table &#8216;PrimaryTable_Medium&#8217;. Scan count 1, logical reads 2, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 0 ms,  elapsed time = 0 ms.<br />
Exists</p>
<p>Table &#8216;PrimaryTable_Medium&#8217;. Scan count 1, logical reads 2, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 0 ms,  elapsed time = 0 ms.<br />
Exists too</p></blockquote>
<p>Two reads in each case and a CPU time so low it&#8217;s immeasurable. A full scan of the index takes 89 reads (as shown earlier) so it should be clear that SQL read a minimal amount of data, both when the TOP was specified and when it wasn&#8217;t.</p>
<p>On to a more complex test. Again, using EXISTS within an IF</p>
<pre class="brush: sql; title: ; notranslate">IF EXISTS (
SELECT 1 FROM PrimaryTable_Medium
WHERE RIGHT(SomeColumn,2) &gt; 'HH'
GROUP BY LEFT(SomeColumn,1)
HAVING COUNT(*) &gt; 1
)
PRINT 'Exists Again'

IF EXISTS (
SELECT TOP (1) 1 FROM PrimaryTable_Medium
WHERE RIGHT(SomeColumn,2) &gt; 'HH'
GROUP BY LEFT(SomeColumn,1)
HAVING COUNT(*) &gt; 1
)
PRINT 'Still Exists'</pre>
<p>If I run just the SELECT 1 alone, 10 rows are returned.</p>
<p>Execution plans are a lot more complex, pretty much to be expected. They&#8217;re still identical, as are the IOs and CPU time.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/03/TopExists2.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="TopExists2" src="http://sqlinthewild.co.za/wp-content/uploads/2011/03/TopExists2_thumb.png" border="0" alt="TopExists2" width="484" height="174" /></a></p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;PrimaryTable_Medium&#8217;. Scan count 1, logical reads 89, physical reads 0..</p>
<p>SQL Server Execution Times:<br />
CPU time = 31 ms,  elapsed time = 27 ms.<br />
Exists Again</p>
<p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;PrimaryTable_Medium&#8217;. Scan count 1, logical reads 89, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 31 ms,  elapsed time = 45 ms.<br />
Still Exists</p></blockquote>
<p>This time the index scan was a scan of the entire index (89 pages). Because of the aggregation and the having, SQL couldn&#8217;t abort the scan once it had what it needed. All rows needed to be returned so that the aggregation and subsequent filter could be done.</p>
<p>One last test, with an EXISTS subquery.</p>
<p>I&#8217;m going to create a secondary table that has one 20% of the values for SomeColumn in PrimaryTable_Medium, but has each one repeated 500 times for a total of 615000 rows.</p>
<pre class="brush: sql; title: ; notranslate">SELECT  ID ,
SomeColumn
FROM dbo.PrimaryTable_Medium pm
WHERE EXISTS (SELECT 1 FROM dbo.Secondary s WHERE pm.SomeColumn = s.SomeColumn)

SELECT  ID ,
SomeColumn
FROM dbo.PrimaryTable_Medium pm
WHERE EXISTS (SELECT TOP(1) 1 FROM dbo.Secondary s WHERE pm.SomeColumn = s.SomeColumn)</pre>
<p>Again, the execution plans are absolutely identical</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/03/TopExists3.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="TopExists3" src="http://sqlinthewild.co.za/wp-content/uploads/2011/03/TopExists3_thumb.png" border="0" alt="TopExists3" width="484" height="288" /></a></p>
<p>So, for that matter, are the execution statistics</p>
<blockquote><p>Table &#8216;PrimaryTable_Medium&#8217;. Scan count 1, logical reads 22, physical reads 0.<br />
Table &#8216;Secondary&#8217;. Scan count 1, logical reads 1605, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 171 ms,  elapsed time = 443 ms.</p>
<p>Table &#8216;PrimaryTable_Medium&#8217;. Scan count 1, logical reads 22, physical reads 0.<br />
Table &#8216;Secondary&#8217;. Scan count 1, logical reads 1605, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 172 ms,  elapsed time = 387 ms.</p></blockquote>
<p>So, in conclusion, is there any point in adding a TOP to an exists subquery? Does it persuade SQL to return only the minimum information needed to satisfy the Exists?</p>
<p>No to both. The Exists operator itself tries to retrieve jus the absolute minimum of information, so the addition of TOP 1 does nothing except add 5 characters to the query size.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/04/05/to-top-or-not-to-top-an-exists/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Statistics, row estimations and the ascending date column</title>
		<link>http://sqlinthewild.co.za/index.php/2011/03/22/statistics-row-estimations-and-the-ascending-date-column/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/03/22/statistics-row-estimations-and-the-ascending-date-column/#comments</comments>
		<pubDate>Tue, 22 Mar 2011 14:30:00 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=955</guid>
		<description><![CDATA[SQL&#8217;s auto-updating statistics go a fair way to making SQL Server a self-tuning database engine and in many cases they do a reasonably good job However there&#8217;s one place where the statistics&#8217; auto-update fails often and badly. That&#8217;s on the large table with an ascending column where the common queries are looking for the latest [...]]]></description>
			<content:encoded><![CDATA[<p>SQL&#8217;s auto-updating statistics go a fair way to making SQL Server a self-tuning database engine and in many cases they do a reasonably good job</p>
<p>However there&#8217;s one place where the statistics&#8217; auto-update fails often and badly. That&#8217;s on the large table with an ascending column where the common queries are looking for the latest rows.</p>
<p>Let&#8217;s have a look at a common scenario.</p>
<p>We have a large table (imaginatively called &#8216;Transactions&#8217;) with a date time column (even more imaginatively called &#8216;TransactionDate&#8217;). This table gets about 80,000 new records a day and currently has around 8,000,000 records in it. So we can say roughly that another 1% is added to the table size daily. No records are updated and there&#8217;s a monthly purge of old data so that the total size remains about the same. A reasonably common real-life scenario.</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE Accounts (
AccountID INT IDENTITY PRIMARY KEY,
AccountNumber CHAR(8),
AccountType CHAR(2),
AccountHolder VARCHAR(50),
Filler CHAR(50) -- simulating other columns
)

CREATE TABLE Transactions (
TransactionID INT IDENTITY PRIMARY KEY NONCLUSTERED,
AccountID INT NOT NULL FOREIGN KEY REFERENCES Accounts (AccountID),
TransactionDate DATETIME NOT NULL DEFAULT GETDATE(),
TransactionType CHAR(2),
Amount NUMERIC(18,6),
Filler CHAR(150) -- Simulating other columns
)
GO
CREATE CLUSTERED INDEX idx_Transactions_TransactionDate
ON Transactions (TransactionDate)

CREATE NONCLUSTERED INDEX idx_Transactions_AccountID
ON Transactions (AccountID)

CREATE NONCLUSTERED INDEX idx_Accounts_AccountType
ON Accounts (AccountType)

-- Using RedGate's SQLDataGenerator to generate some data for this.</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/03/Accounts.png"><img style="display: inline; border-width: 0px;" title="Accounts" src="http://sqlinthewild.co.za/wp-content/uploads/2011/03/Accounts_thumb.png" border="0" alt="Accounts" width="162" height="244" /></a> <a href="http://sqlinthewild.co.za/wp-content/uploads/2011/03/Transactions.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="Transactions" src="http://sqlinthewild.co.za/wp-content/uploads/2011/03/Transactions_thumb.png" border="0" alt="Transactions" width="173" height="244" /></a></p>
<p>Day 1 of the month, the indexes have just been rebuilt (after the data purge) and the statistics associated with those have been updated. The latest value in the TransactionDate column is &#8217;2011/01/31&#8242; and the last value in the statistics histogram is &#8217;2011/01/31&#8242;. Life is good.</p>
<p>Day 2 of the month, there have been 80,000 new records added for the previous day. Only 1% of the table has been updated, so the automatic statistics update would not have triggered. The latest value in the TransactionDate column is &#8217;2011/02/01&#8242; and the last value in the statistics histogram is &#8217;2011/01/31&#8242;. Doesn&#8217;t look like a problem.</p>
<p>Fast forwards another couple of days. Day 5 of the month. By this point 300,000 rows have been added since the beginning of the month. This amounts to around 5% of the table. Hence the statistics auto-update (triggered at 20%) still would not have run. The latest value in the TransactionDate column is &#8217;2011/02/04&#8242; and the last value in the statistics histogram is &#8217;2011/01/31&#8242;. Starting to look less than ideal.</p>
<p>So, what kind of effect does this have on the queries against that table?<span id="more-955"></span></p>
<p>Let&#8217;s assume there&#8217;s an important query that runs every morning to calculate the totals for the previous three day&#8217;s transactions.</p>
<pre class="brush: sql; title: ; notranslate">CREATE PROCEDURE AccountPositionSummary (
 @EffectiveDate DATE
)
AS
SELECT a.AccountNumber, a.AccountHolder, t.TransactionType, SUM(t.Amount) AS AccountPosition, CAST(TransactionDate AS DATE) AS EffectiveDate
 FROM dbo.Accounts a
 INNER JOIN dbo.Transactions t ON a.AccountID = t.AccountID
 WHERE t.TransactionDate &gt;= @EffectiveDate
 GROUP BY a.AccountNumber, a.AccountHolder, t.TransactionType, CAST(t.TransactionDate AS DATE)
GO</pre>
<p>Day 1 ( the 1st of Feb) the query is run over the records from the 29th January to the 31st. The stats histogram lists the 31st as the maximum value of the TransactionDate column (which it is) and so the optimiser is able to get a very accurate estimate of the rows affected.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/03/GoodExecPlan.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="GoodExecPlan" src="http://sqlinthewild.co.za/wp-content/uploads/2011/03/GoodExecPlan_thumb.png" border="0" alt="GoodExecPlan" width="364" height="90" /></a></p>
<p>The execution stats look pretty decent considering the amount of data in the table.</p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;Transactions&#8217;. Scan count 1, logical reads 6329, physical reads 0.<br />
Table &#8216;Accounts&#8217;. Scan count 1, logical reads 112, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 1076 ms,  elapsed time = 2661 ms.</p></blockquote>
<p>Day 2 (2nd Feb) the query is run over the records from the 30th January to the 1st Feb. The plan is unchanged and still fast, but there&#8217;s early warning signs of a problem</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/03/EarlyWarning.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="EarlyWarning" src="http://sqlinthewild.co.za/wp-content/uploads/2011/03/EarlyWarning_thumb.png" border="0" alt="EarlyWarning" width="148" height="244" /></a></p>
<p>There&#8217;s a difference in the estimated and actual row counts, and not a small one. It&#8217;s not affecting the plan, yet, but there are several days still to go before the auto_update will kick in (at around 1% of the table modified each day, it&#8217;ll be about the 20th of the month before the auto update threshold is hit).</p>
<p>On Day 5, what does the plan look like?</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/03/BadExecPlan.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="BadExecPlan" src="http://sqlinthewild.co.za/wp-content/uploads/2011/03/BadExecPlan_thumb.png" border="0" alt="BadExecPlan" width="364" height="84" /></a></p>
<p>Radically different. It&#8217;s worth noting that, because I&#8217;ve just been copying the previous day&#8217;s rows, the row count hasn&#8217;t changed at all, but we now have a nested loop join sitting in the middle. Nested loop joins do not work well with large numbers of rows in the outer table. Don&#8217;t be deceived by the narrow arrow leading to the nested loop. In the Management Studio display, the width can be defined by the estimated row counts (in this case 1), not the actual in some cases (note that the arrow in question doesn&#8217;t have an actual row count on it).</p>
<p>So have the execution stats changed?</p>
<blockquote><p>Table &#8216;Accounts&#8217;. Scan count 0, logical reads 530784, physical reads 0.<br />
Table &#8216;Transactions&#8217;. Scan count 1, logical reads 6358, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 2839 ms,  elapsed time = 6746 ms.</p></blockquote>
<p>Um yeah, just slightly.</p>
<p>What&#8217;s happening here is that, because the new rows are all at the end of the index, the stats histogram doesn&#8217;t just show a estimate lower than the actual rows (as would happen for new rows inserted across the range of the data), it indicates that there are absolutely no matching rows. The optimiser then generates a plan optimal for one row.</p>
<p>Ironically, an ascending column is considered a good choice for a clustered index because it reduces fragmentation and page splits. Combine that with a query that always looks for the latest values and it&#8217;s very easy to end up with a query that intermittently performs absolutely terribly with no easy-to-see cause.</p>
<p>The fix isn&#8217;t hard, a scheduled statistics update, maybe daily depending on when data is loaded and what the queries filter for, fixes this completely. The trick is often realising that it is necessary.</p>
<p>It is worth noting that the example I&#8217;ve contrived here is not an isolated example, and it&#8217;s on the low-end of the possible effects. At a previous job I saw a rather critical daily process that would go from around 30 minutes at the beginning of the month to several hours somewhere late in the second week of the month, because the stats on the datetime column indicated 0 rows, instead of the 5 million that the query would actually return.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/03/22/statistics-row-estimations-and-the-ascending-date-column/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Are int joins faster than string joins?</title>
		<link>http://sqlinthewild.co.za/index.php/2011/02/15/are-int-joins-faster-than-string-joins-2/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/02/15/are-int-joins-faster-than-string-joins-2/#comments</comments>
		<pubDate>Tue, 15 Feb 2011 14:30:18 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=869</guid>
		<description><![CDATA[This one comes up a lot on the forums, often as advice given… &#8220;You should use integers for the key rather than strings. The joins will be faster.&#8221; It sounds logical. Surely integers are easier to compare than large complex strings. Question is, is it true? This is going to take lots and lots of [...]]]></description>
			<content:encoded><![CDATA[<p>This one comes up a lot on the forums, often as advice given…</p>
<blockquote><p>&#8220;You should use integers for the key rather than strings. The joins will be faster.&#8221;</p></blockquote>
<p>It sounds logical. Surely integers are easier to compare than large complex strings. Question is, is it true?</p>
<p>This is going to take lots and lots of rows to get any significant results. I&#8217;m going to do two sets of tests. The first comparing query execution speed between string and integer data types on the join column while keeping the size of the join column the same between the two. This is to tell if there&#8217;s a difference just because of the data type.</p>
<p>The second test will have both the data type and the size of the join columns differing, while the total size of the table row will be kept the same. This is to answer the &#8216;string joins are slower because they are larger&#8217; argument.</p>
<h3>Test 1: Same key size, no indexes</h3>
<p>The two tables have the same size join column – a bigint in one and a char(8) in the other.</p>
<pre class="brush: sql; title: ; notranslate">SELECT t1.ID, t2.ID, t1.IntForeignKey, t2.SomeArbStatus
FROM dbo.TestingJoinsInt t1
INNER JOIN dbo.LookupTableInt t2 ON t1.IntForeignKey = t2.ID
GO

SELECT t1.ID, t2.ID, t1.StrForeignKey, t2.SomeArbStatus
FROM dbo.TestingJoinsString t1
INNER JOIN dbo.LookupTableString t2 ON t1.StrForeignKey = t2.ID
GO</pre>
<p>First up, a check of what Statistics IO and Statistics Time show for the two and whether there&#8217;s any difference in execution plan. (Full repo code is available for download, link at the end of the post)</p>
<p><strong>Int joins</strong></p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;TestingJoinsInt&#8217;. Scan count 1, logical reads 66036, physical reads 0.<br />
Table &#8216;LookupTableInt&#8217;. Scan count 1, logical reads 735, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 2433 ms,  elapsed time = 32574 ms.</p></blockquote>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/IntJoins1.png"><img style="display: inline; border-width: 0px;" title="IntJoins1" src="http://sqlinthewild.co.za/wp-content/uploads/2011/01/IntJoins1_thumb.png" border="0" alt="IntJoins1" width="364" height="146" /></a></p>
<p><strong> </strong></p>
<p><strong>String joins</strong></p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;TestingJoinsString&#8217;. Scan count 1, logical reads 66036, physical reads 0.<br />
Table &#8216;LookupTableString&#8217;. Scan count 1, logical reads 735, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 3744 ms,  elapsed time = 33947 ms.</p></blockquote>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/StringJoins1.png"><img style="display: inline; border-width: 0px;" title="StringJoins1" src="http://sqlinthewild.co.za/wp-content/uploads/2011/01/StringJoins1_thumb.png" border="0" alt="StringJoins1" width="364" height="133" /></a></p>
<p><span id="more-869"></span>Execution plan&#8217;s the same, but that shouldn&#8217;t really be a surprise. With no nonclustered indexes there have to be table scans (or clustered index scan) and, with the resultsets not ordered by the join key a hash join is about the only join that could be used efficiently here.</p>
<p>The CPU time is the interesting thing. 35% more CPU time from the string join. To check that the difference is consistent and not a once off, I&#8217;m going to run the same test 10 times each and use Profiler to catch the durations and CPU times and aggregate.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/ComparisonIntString.png"><img style="display: inline; border-width: 0px;" title="ComparisonIntString" src="http://sqlinthewild.co.za/wp-content/uploads/2011/01/ComparisonIntString_thumb.png" border="0" alt="ComparisonIntString" width="484" height="84" /></a></p>
<p>That&#8217;s a notable difference in the average CPU usage. Average of 31% greater CPU usage from the string join over the integer join.</p>
<p>Maybe an index will fix things…</p>
<h3>Test 2: Same key size, indexes on join column</h3>
<p>Same tables, just with a nonclustered index added on the foreign key column.</p>
<p><strong>Int joins</strong></p>
<blockquote><p>Table &#8216;TestingJoinsInt&#8217;. Scan count 1, logical reads 4654.<br />
Table &#8216;LookupTableInt&#8217;. Scan count 1, logical reads 735.</p>
<p>SQL Server Execution Times:<br />
CPU time = 2043 ms,  elapsed time = 30993 ms.</p></blockquote>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/IntJoins2.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="IntJoins2" src="http://sqlinthewild.co.za/wp-content/uploads/2011/01/IntJoins2_thumb.png" border="0" alt="IntJoins2" width="364" height="145" /></a></p>
<p><strong>String joins:</strong></p>
<blockquote><p>Table &#8216;TestingJoinsString&#8217;. Scan count 1, logical reads 4654.<br />
Table &#8216;LookupTableString&#8217;. Scan count 1, logical reads 735.</p>
<p>SQL Server Execution Times:<br />
CPU time = 2995 ms,  elapsed time = 32904 ms.</p></blockquote>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/StringJoins2.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="StringJoins2" src="http://sqlinthewild.co.za/wp-content/uploads/2011/01/StringJoins2_thumb.png" border="0" alt="StringJoins2" width="364" height="140" /></a></p>
<p>The one scan has changed to an index scan and the join type is now merge join (as the indexes provide the join order), but the plan still has the same form (as would be expected) and there&#8217;s still a fairly substantial difference in the CPU times.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/ComparisonIntString2.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="ComparisonIntString2" src="http://sqlinthewild.co.za/wp-content/uploads/2011/01/ComparisonIntString2_thumb.png" border="0" alt="ComparisonIntString2" width="484" height="50" /></a></p>
<p>On average a 50% increase in CPU time. That&#8217;s pretty extreme.</p>
<h3>Test 3: Same row size, no indexes</h3>
<p>For this test, the join columns are now different sizes. (This is the second script in the attached repo code for anyone using that) I&#8217;m using an int in the one table and a char(24) in the other. This is probably a little more realistic, if strings are being used as keys and join columns, there&#8217;s a very good chance that it will be longer than if an int was used.</p>
<p>Straight into the testing, the query&#8217;s the same form, the names of the tables are the only things that changed.</p>
<pre class="brush: sql; title: ; notranslate">SELECT t1.ID, t2.ID, t1.IntForeignKey, t2.SomeArbStatus
 FROM dbo.TestingJoinsInt2 t1
 INNER JOIN dbo.LookupTableInt2 t2 ON t1.IntForeignKey = t2.ID
GO

SELECT t1.ID, t2.ID, t1.StrForeignKey, t2.SomeArbStatus
 FROM dbo.TestingJoinsString2 t1
 INNER JOIN dbo.LookupTableString2 t2 ON t1.StrForeignKey = t2.ID
GO</pre>
<p><strong>Int joins:</strong></p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;TestingJoinsInt2&#8242;. Scan count 1, logical reads 64342, physical reads 0.<br />
Table &#8216;LookupTableInt2&#8242;. Scan count 1, logical reads 685, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 2293 ms,  elapsed time = 30839 ms.</p></blockquote>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/IntJoins3.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="IntJoins3" src="http://sqlinthewild.co.za/wp-content/uploads/2011/01/IntJoins3_thumb.png" border="0" alt="IntJoins3" width="364" height="139" /></a></p>
<p><strong>String joins:</strong></p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;TestingJoinsString2&#8242;. Scan count 1, logical reads 64342, physical reads 0.<br />
Table &#8216;LookupTableString2&#8242;. Scan count 1, logical reads 688, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 4290 ms,  elapsed time = 36742 ms.</p></blockquote>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/StringJoin3.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="StringJoin3" src="http://sqlinthewild.co.za/wp-content/uploads/2011/01/StringJoin3_thumb.png" border="0" alt="StringJoin3" width="364" height="145" /></a></p>
<p>And we&#8217;re back to the same plan as in the first test – clustered index scans and hash join, for the same reasons. I seem to have messed up somewhere in trying to keep the tables the same size. Still, 3 reads difference on the lookup table is not really a large difference.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/ComparisonIntString3.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="ComparisonIntString3" src="http://sqlinthewild.co.za/wp-content/uploads/2011/01/ComparisonIntString3_thumb.png" border="0" alt="ComparisonIntString3" width="484" height="59" /></a></p>
<p>Just as in all the previous tests, the average CPU usage on the string join is markedly higher. This time it&#8217;s nearly 100% greater than for the int joins.</p>
<h3>Test 4: Same row size, index on join column</h3>
<p>In this test I&#8217;m breaking my own test rules a bit. While the table&#8217;s row size is the same between the two tables, the index row size is not. Still, I feel it&#8217;s fair enough as it reflects what would be done on a real system (no one pads out indexes for no reason)</p>
<p><strong>Int joins:</strong></p>
<blockquote><p>Table &#8216;TestingJoinsInt2&#8242;. Scan count 1, logical reads 3407, physical reads 0.<br />
Table &#8216;LookupTableInt2&#8242;. Scan count 1, logical reads 685, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 2075 ms,  elapsed time = 30459 ms.</p></blockquote>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/IntJoins4.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="IntJoins4" src="http://sqlinthewild.co.za/wp-content/uploads/2011/01/IntJoins4_thumb.png" border="0" alt="IntJoins4" width="364" height="164" /></a></p>
<p><strong>String joins:</strong></p>
<blockquote><p>Table &#8216;TestingJoinsString2&#8242;. Scan count 1, logical reads 9625, physical reads 0.<br />
Table &#8216;LookupTableString2&#8242;. Scan count 1, logical reads 688, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 3775 ms,  elapsed time = 34028 ms.</p></blockquote>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/StringJoin4.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="StringJoin4" src="http://sqlinthewild.co.za/wp-content/uploads/2011/01/StringJoin4_thumb.png" border="0" alt="StringJoin4" width="364" height="151" /></a></p>
<p>Same plan as the second test. I hope no one&#8217;s surprised by that.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/ComparisonIntString4.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="ComparisonIntString4" src="http://sqlinthewild.co.za/wp-content/uploads/2011/01/ComparisonIntString4_thumb.png" border="0" alt="ComparisonIntString4" width="484" height="50" /></a></p>
<p>And now we&#8217;re over a 100% increase in average CPU times for these two. The differing row sizes (with corresponding differing page counts) will be contributing to that, but just contributing, not causing, since we were seeing similar increases in earlier cases.</p>
<h3>Conclusion</h3>
<p>Is the use of integer data types better for join columns than strings? It certainly does appear so, and not insignificantly either. Bear in mind though that what I was doing here was a bit extreme. 2.5 million rows in a a query with no filters applied. This shouldn&#8217;t be something that ever gets done in a real system. So it&#8217;s not a case that YMMV<sup>1</sup>, it&#8217;s a case that it almost certainly will. You probably will see different results, but it is definitely something worth testing when planning data types for a DB</p>
<p>While this may not the final nail in the coffin for natural keys, it is worth keeping in mind when choosing between natural and artificial keys for a system, especially one likely to process large numbers of rows, such as a decision support/datawarehouse system. Test carefully with expected data volumes and expected loads if you&#8217;re considering natural keys and decide based on the result of those tests.</p>
<p>Repo code: <a href="http://sqlinthewild.co.za/wp-content/uploads/2011/01/Int-vs-String-joins.zip">Int vs String joins</a></p>
<p>(1) YMMV = &#8216;Your mileage may vary&#8217;, colloquialism that means that you may get a different result from me.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/02/15/are-int-joins-faster-than-string-joins-2/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Is a clustered index best for range queries?</title>
		<link>http://sqlinthewild.co.za/index.php/2011/02/01/is-a-clustered-index-best-for-range-queries/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/02/01/is-a-clustered-index-best-for-range-queries/#comments</comments>
		<pubDate>Tue, 01 Feb 2011 14:30:00 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Indexes]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=822</guid>
		<description><![CDATA[I see a lot of advice that talks about the clustered index been the best index for use for range queries, that is queries with inequalities filters, queries that retrieve ranges of rows, as opposed to singleton queries, queries that retrieve single rows (including, unfortunately, a Technet article). I suspect the reasoning behind this advice [...]]]></description>
			<content:encoded><![CDATA[<p>I see a lot of advice that talks about the clustered index been the best index for use for range queries, that is queries with inequalities filters, queries that retrieve ranges of rows, as opposed to singleton queries, queries that retrieve single rows (including, unfortunately, a <a href="http://technet.microsoft.com/en-us/library/ms190639.aspx">Technet article</a>).</p>
<p>I suspect the reasoning behind this advice is the idea that the clustered index stores the data in order of the clustering key (ack) and hence it&#8217;s &#8216;logical&#8217; that such a structure would be best for range scans as SQL can simply start at the beginning of the range and read sequentially to the end.</p>
<p>Question is, is that really the case?</p>
<p>Let&#8217;s do some experiments and find out.</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE TestingRangeQueries (
ID INT IDENTITY,
SomeValue NUMERIC(7,2),
Filler CHAR(500) DEFAULT ''
)

-- 1 million rows
INSERT INTO TestingRangeQueries (SomeValue)
SELECT TOP (1000000) RAND(CAST(a.object_id AS BIGINT) + b.column_id*2511)
FROM msdb.sys.columns a CROSS JOIN msdb.sys.columns b

-- One cluster and two nonclustered indexes on the column that will be used for the range filter

CREATE CLUSTERED INDEX idx_RangeQueries_Cluster
ON TestingRangeQueries (ID)

CREATE NONCLUSTERED INDEX idx_RangeQueries_NC1
ON TestingRangeQueries (ID)

CREATE NONCLUSTERED INDEX idx_RangeQueries_NC2
ON TestingRangeQueries (ID)
INCLUDE (SomeValue)
GO</pre>
<p>The query that I&#8217;ll be testing with will do a sum of the SomeValue column for a large range of ID values. That means that of the three indexes that I&#8217;m testing, one is clustered, one is a nonclustered that does not cover the query and the third is a covering nonclustered index.</p>
<pre class="brush: sql; title: ; notranslate">SELECT SUM(SomeValue)
FROM TestingRangeQueries
WHERE ID BETWEEN 20000 and 200000 -- 180 001 rows, 18% of the table</pre>
<p>I&#8217;m going to run the same range scan query three times, each with an index hint so that SQL will use the three different indexes, regardless of which one it thinks is best.</p>
<p>First up, the clustered index.</p>
<p>As expected, we get a clustered index seek (the predicate is SARGable) and a stream aggregate.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/12/ClusteredIndex.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="ClusteredIndex" src="http://sqlinthewild.co.za/wp-content/uploads/2010/12/ClusteredIndex_thumb.png" border="0" alt="ClusteredIndex" width="484" height="86" /></a></p>
<blockquote><p>Table &#8216;TestingRangeQueries&#8217;. Scan count 1, logical reads 12023, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 94 ms,  elapsed time = 110 ms.</p></blockquote>
<p><span id="more-822"></span></p>
<p>So if the advice is correct, this should be the best (lowest CPU, lowest IO). Let&#8217;s see…</p>
<p>The first nonclustered index does not cover the query. Hence, seeing as this query returns a substantial portion of the table, we could assume that the optimiser probably <a href="http://sqlinthewild.co.za/index.php/2009/01/09/seek-or-scan/">wouldn&#8217;t chose to use it</a> because of the cost of the key lookups. If that is the case, then if the query probably won&#8217;t be very efficient if I force the use of that index.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/12/NonclusteredIndex1.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="NonclusteredIndex1" src="http://sqlinthewild.co.za/wp-content/uploads/2010/12/NonclusteredIndex1_thumb.png" border="0" alt="NonclusteredIndex1" width="484" height="115" /></a></p>
<blockquote><p>Table &#8216;TestingRangeQueries&#8217;. Scan count 1, logical reads 551413, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 562 ms,  elapsed time = 560 ms.</p></blockquote>
<p>Ow. Not very efficient at all. Those key lookups hurt.</p>
<p>One last index to test, the covering non-clustered index.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/12/NonclusteredIndex2.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="NonclusteredIndex2" src="http://sqlinthewild.co.za/wp-content/uploads/2010/12/NonclusteredIndex2_thumb.png" border="0" alt="NonclusteredIndex2" width="484" height="95" /></a></p>
<blockquote><p>Table &#8216;TestingRangeQueries&#8217;. Scan count 1, logical reads 338, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 62 ms,  elapsed time = 67 ms.</p></blockquote>
<p>2/3 the CPU usage of the query using the clustered index and about 3% of the reads. No question about it, this one&#8217;s faster and less resource intensive. That pretty much invalidates the claim that the clustered index is best for range queries.</p>
<p>So what&#8217;s going on here?</p>
<p>The technet article I linked to at the beginning of this post states the following as reasoning for recommending a clustered index for range queries:</p>
<blockquote><p>After the row with the first value is found by using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent.</p></blockquote>
<p>Um, well, ignoring that there&#8217;s no guarantee of physical adjacency with an index at all, how does this differ from a nonclustered index?</p>
<p>In a clustered index, the leaf pages are logically ordered by the clustered index key (meaning that SQL can follow a page&#8217;s next page pointer to get the next page in the key order). To do a range query using the clustered index, SQL will seek down the b-tree to the start of the range and then read along the leaf pages, following the next page pointers, until it reaches the end of the range.</p>
<p>In a nonclustered index, the leaf pages are logically ordered by the index key (just the same as in a cluster). To do a range query using the nonclustered index, SQL will seek down the b-tree to the start of the range and then read along the leaf pages, following the next page pointers, until it reaches the end of the range. If additional columns are needed, SQL will then do a key/RID lookup for each row to retrieve the additional rows.</p>
<p>Not much difference there, other than the key lookups. So &#8216;physical adjacency&#8217; is pretty much ruled out as a reason using the clustered index  (if it was even true)</p>
<p>What is important, as we saw from the two tests of the nonclustered index, is that when the query is retrieving a significant portion of the table (and by &#8216;significant&#8217; I mean more than about 1%), the index needs to be covering, or the cost of the key lookups becomes overwhelming. Hence, what we want for a range query is a covering index.</p>
<p>The clustered index is always covering, because it contains, at the leaf level, all the columns of the table. It is, however, the largest index on a table<sup>1</sup>. The larger the index, the less efficient that index becomes. Hence while the clustered index is good for a range query, it&#8217;s not the best possible index for a range query.</p>
<p>The best possible index for a range query is the smallest index that is seekable and covers the query (the same as for just about any other query).</p>
<p>Now it&#8217;s not always possible to cover a query, and some queries shouldn&#8217;t be covered. There will be times when the cluster is the best choice for range queries, either because of the number of columns required or because just about every query filters on a particular column and that column is a <a href="http://www.sqlservercentral.com/articles/Indexing/68563/">good choice for the cluster</a>. Just don&#8217;t make the mistake of thinking it&#8217;s the only choice.</p>
<p>(1) It is possible to have a nonclustered index that&#8217;s larger than the clustered index. Takes some work though and is far from a usual case.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/02/01/is-a-clustered-index-best-for-range-queries/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Distincting an IN subquery</title>
		<link>http://sqlinthewild.co.za/index.php/2011/01/18/distincting-an-in-subquery/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/01/18/distincting-an-in-subquery/#comments</comments>
		<pubDate>Tue, 18 Jan 2011 14:00:18 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=812</guid>
		<description><![CDATA[This is going to be a quick one… I keep seeing forum code (and production code) that includes the DISTINCT in IN or EXISTS subqueries. The rationale is given either as a performance enhancement or as necessary for correct results. Is it necessary or useful? Only one way to find out. Let&#8217;s check for correct [...]]]></description>
			<content:encoded><![CDATA[<p>This is going to be a quick one…</p>
<p>I keep seeing forum code (and production code) that includes the DISTINCT in IN or EXISTS subqueries. The rationale is given either as a performance enhancement or as necessary for correct results.</p>
<p>Is it necessary or useful? Only one way to find out.</p>
<p>Let&#8217;s check for correct results first, because that can be done with nice small tables.</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE DistinctOuter (
ID INT
);

CREATE TABLE DistinctInner (
ID INT
);

INSERT INTO DistinctOuter
VALUES (1), (2), (3), (4), (5), (6), (7), (8)

INSERT INTO DistinctInner
VALUES (1), (2), (2), (2), (2), (4), (6), (7)</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/12/DistinctIN.png"><img style="display: inline; border-width: 0px;" title="DistinctIN" src="http://sqlinthewild.co.za/wp-content/uploads/2010/12/DistinctIN_thumb.png" border="0" alt="DistinctIN" width="364" height="298" /></a></p>
<p><span id="more-812"></span></p>
<p>No difference there, results are the same. I&#8217;m not going to run the test for EXISTS because, if anyone remembers how EXISTS works (or remembers a <a href="http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/">blog post</a> I wrote a while back), EXISTS doesn&#8217;t depend on what&#8217;s in the SELECT clause at all, it just looks for existence of rows, and DISTINCT cannot remove unique rows, just duplicates.</p>
<p>A look at the execution plan shows why there are no duplicate values returned in the first query (the one without DISTINCT).</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/12/DistinctIN2.png"><img style="display: inline; border-width: 0px;" title="DistinctIN2" src="http://sqlinthewild.co.za/wp-content/uploads/2010/12/DistinctIN2_thumb.png" border="0" alt="DistinctIN2" width="364" height="164" /></a></p>
<p>That&#8217;s a semi-join there, not a complete join. A semi-join is a join that just checks for matches but doesn&#8217;t return rows from the second table. Since it&#8217;s just a check for existence, duplicate rows in the inner table are not going to make any difference to the results.</p>
<p>So that answers the correctness aspect, distinct is not necessary to get correct results. But does it improve performance by having it there? Or does it perhaps reduce the performance? Time for larger tables.</p>
<p>Stolen from my <a href="http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/">last look</a> at EXISTS and IN:</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE PrimaryTable_Large (
id INT IDENTITY PRIMARY KEY,
SomeColumn char(4) NOT NULL,
Filler CHAR(100)
);

CREATE TABLE SecondaryTable_Large (
id INT IDENTITY PRIMARY KEY,
LookupColumn char(4) NOT NULL,
SomeArbDate Datetime default getdate()
);
GO

INSERT INTO PrimaryTable_Large (SomeColumn)
SELECT top 1000000
char(65+FLOOR(RAND(a.column_id *5645 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *3784 + b.object_id)*12)) +
char(65+FLOOR(RAND(b.column_id *6841 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *7544 + b.object_id)*8))
from msdb.sys.columns a cross join msdb.sys.columns b;

INSERT INTO SecondaryTable_Large (LookupColumn)
SELECT SomeColumn
FROM PrimaryTable_Large TABLESAMPLE (25 PERCENT);</pre>
<p>Some row counts first.</p>
<ul>
<li>Total rows in PrimaryTable_Large: 1000000</li>
<li>Total rows in SecondaryTable_Large: 256335</li>
<li>Total distinct values in LookupColumn in SecondaryTable_Large: 10827</li>
</ul>
<p>First test is without indexes on the lookup columns:</p>
<pre class="brush: sql; title: ; notranslate">SELECT ID, SomeColumn FROM PrimaryTable_Large
WHERE SomeColumn IN (SELECT LookupColumn FROM SecondaryTable_Large)

SELECT ID, SomeColumn FROM PrimaryTable_Large
WHERE SomeColumn IN (SELECT DISTINCT LookupColumn FROM SecondaryTable_Large)</pre>
<p>The reads are identical, which shouldn&#8217;t be a surprise as there&#8217;s no way with the current tables to run those queries without doing a full table scan.</p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;PrimaryTable_Large&#8217;. Scan count 1, logical reads 14548, physical reads 0.<br />
Table &#8216;SecondaryTable_Large&#8217;. Scan count 1, logical reads 798, physical reads 0.</p></blockquote>
<p>For durations and CPU, I&#8217;m going to run each 10 times and aggregate the results from the profiler T-SQL:BatchCompleted event. The results are just about identical.</p>
<ul>
<li>IN without DISTINCT: CPU 1.21 seconds, duration 12.2 seconds</li>
<li>IN with DISTINCT: CPU 1.25 seconds, duration 11.9 seconds</li>
</ul>
<p>Furthermore, the execution plans are identical. Something interesting to notice in this case is that the join is not a semi-join, it&#8217;s a complete join and to ensure that the complete join doesn&#8217;t return duplicate rows (which would be incorrect), there&#8217;s a hash match (aggregate) right before the join that&#8217;s removing duplicate rows from the inner resultset, and that&#8217;s present in both execution plans, when the distinct is specified and when it&#8217;s not.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/12/LargeINNoIndexes.png"><img style="display: inline; border-width: 0px;" title="LargeINNoIndexes" src="http://sqlinthewild.co.za/wp-content/uploads/2010/12/LargeINNoIndexes_thumb.png" border="0" alt="LargeINNoIndexes" width="484" height="266" /></a></p>
<p>One last question to answer &#8211; does the presence of indexes change anything?</p>
<pre class="brush: sql; title: ; notranslate">CREATE INDEX idx_Primary
ON dbo.PrimaryTable_Large (SomeColumn)

CREATE INDEX idx_Secondary
ON dbo.SecondaryTable_Large (LookupColumn)</pre>
<p>The execution plan has changed, in operators if not in general form. The hash join is replaced by a merge join (still a complete join, not a semi-join), the hash match (aggregate) has been replaced by a stream aggregate and the clustered index scans are now (nonclustered) index scans</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/12/LargeINIndexes.png"><img style="display: inline; border-width: 0px;" title="LargeINIndexes" src="http://sqlinthewild.co.za/wp-content/uploads/2010/12/LargeINIndexes_thumb.png" border="0" alt="LargeINIndexes" width="484" height="278" /></a></p>
<p>The reads are still identical between the two, which should be no surprise at all. As for the durations:</p>
<ul>
<li>IN without DISTINCT: CPU 0.82 seconds, duration 10.1 seconds</li>
<li>IN with DISTINCT: CPU 0.79 seconds, duration 10.5 seconds</li>
</ul>
<p>Again so close that the small difference should be ignored.</p>
<p>So in conclusion, is there any need or use for DISTINCT in the subquery for an IN predicate? By all appearances, none whatsoever. The SQL query optimiser is smart enough to ignore the specified DISTINCT if it&#8217;s not necessary (as we saw in the first example) and to add an operator to remove duplicates if it is necessary (as we saw in the 2nd and 3rd examples), regardless of whether or not there&#8217;s a DISTINCT specified in the query.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/01/18/distincting-an-in-subquery/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Hit and miss</title>
		<link>http://sqlinthewild.co.za/index.php/2010/07/27/hit-and-miss/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/07/27/hit-and-miss/#comments</comments>
		<pubDate>Tue, 27 Jul 2010 14:00:10 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Execution Plans]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=377</guid>
		<description><![CDATA[Or &#8220;Monitoring plan cache usage&#8221; For people interested in the details of how SQL is using and reusing execution plans, there are some useful events in profiler for watching this in detail, under the Stored procedure group: SP:CacheMiss SP:CacheInsert SP:CacheHit SP:CacheRemove SP:Recompile SP:StmtRecompile Additionally there&#8217;s the SQL:StmtRecompile event under the TSQL group. For now, I [...]]]></description>
			<content:encoded><![CDATA[<p>Or &#8220;<em>Monitoring plan cache usage</em>&#8221;</p>
<p>For people interested in the details of how SQL is using and reusing execution plans, there are some useful events in profiler for watching this in detail, under the Stored procedure group:</p>
<ul>
<li>SP:CacheMiss</li>
<li>SP:CacheInsert</li>
<li>SP:CacheHit</li>
<li>SP:CacheRemove</li>
<li>SP:Recompile</li>
<li>SP:StmtRecompile</li>
</ul>
<p>Additionally there&#8217;s the SQL:StmtRecompile event under the TSQL group.</p>
<p>For now, I just want to look briefly at the CacheMiss and CacheHit events.</p>
<p>One word of caution early on, these are frequently occurring events and it may not be a good idea to trace these on busy production servers. If you do need to, keep the duration of the trace short and the columns to a minimum.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/07/CacheEvents.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="CacheEvents" src="http://sqlinthewild.co.za/wp-content/uploads/2010/07/CacheEvents_thumb.png" border="0" alt="CacheEvents" width="244" height="157" /></a></p>
<h3>CacheMiss</h3>
<p>The cache miss event fires any time SQL looks for the execution plans for an object or ad-hoc batch and does not find it in the plan cache.</p>
<p>For an object (scalar function, multi-statement table-valued function, stored procedure or trigger) the match is done on the object ID (along with some of the connection&#8217;s SET options and possibly the database user and c couple other factors<sup>1</sup>). For an ad-hoc batch, the match is done on a hash of the text of the batch (along with some of the connection&#8217;s SET options and possibly the database user)</p>
<p>When testing stored procedures from Management Studio (or another SQL querying tool), two CacheMiss events will appear in the trace.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/07/CacheMiss.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="CacheMiss" src="http://sqlinthewild.co.za/wp-content/uploads/2010/07/CacheMiss_thumb.png" border="0" alt="CacheMiss" width="490" height="87" /></a></p>
<p>What&#8217;s going on here?</p>
<p><span id="more-377"></span></p>
<p>Let&#8217;s start from the bottom and work up. The last event there, the SP:Completed records the completion of the stored procedure and lists both the ObjectID and ObjectName and, if I check in the database, ObjectID 1301579675 does indeed belong to the stored procedure FireCacheEvents.</p>
<p>The second event (the second cache miss) has the same ObjectID and ObjectName. So this is the failed cache lookup for the stored procedure (failed because this is the first time I ran it and the plan was, as a result, not in cache).</p>
<p>The first entry is the one that&#8217;s curious. The ObjectName is not populated and the ObjectID doesn&#8217;t match anything in the database. So what is the cache lookup trying to find?</p>
<p>The TextData column give a hint. What it&#8217;s trying to find is a cached plan for the ad-hoc batch containing the whole of the text submitted to SQL (In this case just &#8216;EXEC FireCacheEvents&#8217;). This too could contain queries (though it doesn&#8217;t in this case) and needs a plan lookup. With just an EXEC in the batch, it won&#8217;t have a plan and hence won&#8217;t be found in cache, but there&#8217;s still a lookup for it.</p>
<p>If there was any SELECT, INSERT, UPDATE or DELETE (or MERGE on SQL 2008) statement within that ad-hoc batch, the batch would also be cached and future cache lookups would succeed but, since it&#8217;s just an EXEC, there&#8217;s no plan to cache.</p>
<p>This additional cache lookup won&#8217;t occur if the execution is via RPC (e.g. A .Net call with the SQLCommand.CommandType = CommandType.StoredProcedure) but it will any time there is an ad-hoc SQL batch submitted (e.g. from Management Studio or .Net if the SQLCommand.CommandType is set to CommandType.Text)</p>
<p>That&#8217;s ad-hoc batches and stored procs. Let&#8217;s try the third possibility &#8211; a parameterised query.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/07/CacheMissparameterised.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="CacheMiss parameterised" src="http://sqlinthewild.co.za/wp-content/uploads/2010/07/CacheMissparameterised_thumb.png" border="0" alt="CacheMiss parameterised" width="496" height="61" /></a></p>
<p>This looks semi-familiar. The first CacheMiss is again for the entire of the ad-hoc batch (consisting of the DECLARE and the EXEC in this case). The second looks similar, something in TextData and no ObjectName. This is the parameterised query, identifiable as parameterised by the (@Status Char(1)) parameter definition right at the beginning. Again here the ObjectID is a hash of the text and doesn&#8217;t match to a real object in the database.</p>
<p>It is worth noting that, if both the CacheMiss and CacheInsert events are traced, the CacheMiss events will only appear if there is no subsequent CacheInsert for the same ObjectID. If I ran the exact same script as above, but had the SP:Completed, SP:CacheMiss and SP:CacheInsert events in the trace, there would still be only three events recorded, the cache miss for the ad-hoc batch (as that&#8217;s not cached there&#8217;s no matching CacheInsert event), the CacheInsert for the stored proc and then the SP:Completed. The CacheMiss for the procedure wouldn&#8217;t appear, though it&#8217;s presence can be intuited from the presence of the CacheInsert.</p>
<p>That should about cover it for the CacheMiss. If the failed cache lookup is for is an ad-hoc batch or parameterised query, the TextData column will be populated with the contents of the batch or query and the ObjectID will be a hash of the text (and shouldn&#8217;t match any object in the database). If the failed cache lookup is for a procedure (or function or trigger), the object name column is usually populated (not always) with the name of the object, TextData is blank and the ObjectID matches the ObjectID of the object in the database</p>
<h3>CacheHit</h3>
<p>Onto the CacheHit event. This, as its name implies, is the opposite to the CacheMiss. The CacheMiss indicates that a lookup to the plan cache failed to find a matching plan. The CacheHit indicates that a lookup to the plan cache succeeded in finding a matching plan (based on object id, set options, maybe user, and various other conditions).</p>
<p>It&#8217;s not certain, even if the cache lookup succeeds, that the plan will indeed be used for the execution of the query/batch/procedure as there are a number of stability and optimality related checks that will be done before the plan is used.</p>
<p>So let&#8217;s see how this event looks.</p>
<pre class="brush: sql; title: ; notranslate">EXEC FireCacheEvents
GO

SELECT ID, SomeDate, Status
FROM dbo.TestingCacheEvents
WHERE Status = 'B'
GO </pre>
<p>Two ad-hoc batches, first with just a stored procedure call, second with just an ad-hoc SQL statement.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/07/CacheHit.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="CacheHit" src="http://sqlinthewild.co.za/wp-content/uploads/2010/07/CacheHit_thumb.png" border="0" alt="CacheHit" width="492" height="85" /></a></p>
<p>No big surprises here, not after looking at the CacheMiss events. The first is the CacheMiss for the first of the ad-hoc batches, the one with just the EXEC in it. Since that is not cached, there will be a CacheMiss every time that executes.</p>
<p>The first of the CacheHit events is for the stored procedure. As with the CacheMiss, this CacheHit for the stored procedure has an ObjectID that matches the ObjectID for that procedure in the database, and the ObjectName column is populated with (surprise) the object name while the TextData is blank.</p>
<p>The second CacheHit is for the ad-hoc batch with the SELECT statement. As is probably expected by this point, the ObjectID there is just a hash of the text, the ObjectName is blank and the TextData is populated with the full text of the batch.</p>
<h3>In Conclusion</h3>
<p>If you&#8217;re monitoring cache usage with the CacheMiss and CacheHit events, there are two different ways to identify what the lookup was looking for.</p>
<p>If the lookup was for the plan of a stored procedure, trigger or function, the ObjectID column contains a value that matches the ObjectID for that object in the database. The TextData column is blank and the ObjectName column is (usually) populated with the name of the object. I did encounter a couple of cases where the ObjectName was blank for a CacheMiss event for a stored procedure, not quite sure why. More investigation is necessary.</p>
<p>If the lookup was for the plan of an ad-hoc batch or parameterised query, the ObjectID contains a meaningless value, the ObjectName column is blank and the TextData contains the entire of the batch/query.</p>
<p>It may also be worth mentioning that the DatabaseID column is populated for all CacheMiss and CacheHit events, regardless of what the lookup is looking for. Additionally the DatabaseName column is populated for all CacheHit events (but is not an available column for the CacheMiss)</p>
<p>(1) For anyone who wants more information, there are two excellent resources available from Microsoft:</p>
<ul>
<li><a href="http://technet.microsoft.com/en-gb/library/cc966425.aspx">Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005</a></li>
<li><a href="http://msdn.microsoft.com/en-us/library/ee343986.aspx">Plan Caching in SQL Server 2008</a></li>
</ul>
<p>Both go extremely deep into caching, what influences matching of plans, plan reuse, recompilations and the plan cache itself.</p>
<p>Reproduction code:</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE TestingCacheEvents (
 ID INT IDENTITY PRIMARY KEY,
 SomeDate DATETIME,
 Status CHAR(1),
 Filler CHAR(300) DEFAULT ' '
);
GO

INSERT INTO TestingCacheEvents (SomeDate, Status)
SELECT TOP (10000)
 DATEADD(dd,FLOOR(RAND(a.object_id+b.column_id*5000)*500),'2000/01/01'),
 CHAR(65+FLOOR(RAND(b.object_id+a.column_id*5000)*10))
 FROM master.sys.columns a CROSS JOIN master.sys.columns b;
GO

CREATE PROCEDURE FireCacheEvents
AS
 SELECT ID, SomeDate, Status
 FROM TestingCacheEvents
 WHERE Status = 'G'

GO</pre>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/07/27/hit-and-miss/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Root of all Evil</title>
		<link>http://sqlinthewild.co.za/index.php/2010/03/11/the-root-of-all-evil/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/03/11/the-root-of-all-evil/#comments</comments>
		<pubDate>Thu, 11 Mar 2010 14:00:51 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[Rants]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=563</guid>
		<description><![CDATA[Or “Shot gun query tuning” There have been a fair few forums questions in recent months asking for help in removing index scans, loop joins, sorts or other, presumed, slow query operators. There have been just as many asking how to replace a subquery with a join or a join with a subquery or similar [...]]]></description>
			<content:encoded><![CDATA[<p>Or “<em>Shot gun query tuning</em>”</p>
<p>There have been a fair few forums questions in recent months asking for help in removing index scans, loop joins, sorts or other, presumed, slow query operators. There have been just as many asking how to replace a subquery with a join or a join with a subquery or similar aspects of a query usually for performance reasons.</p>
<p>The first question that I have to ask when looking at requests like that is &#8220;Why?&#8221;</p>
<p>Why is removing a particular query operator the goal? Why is changing a where clause predicate the goal? If it’s to make the query faster, has the query been examined and has it been confirmed that query operator or predicate really is the problem?</p>
<p>The title of this post refers to a comment I’ve seen again and again in blogs or articles about front-end development. &#8220;<em>Premature optimisation is the root of all evils.</em>&#8221; It’s true in the database field as well.</p>
<p><span id="more-563"></span></p>
<p>While optimisation is very important in database development, trying to optimise queries without any idea where the problem with the query is, or even if the query is a problem at all is about as effective in fixing a database performance problem as using a shotgun from 100 meters is in killing mosquitoes. If you hit the problem, it’s by shear luck and nothing else.</p>
<p>There&#8217;s two sides to this problem.</p>
<p>The first aspect of this is, during development, spending time on optimising a query (or stored procedure) without any idea whether or not the query is inefficient and no idea whether or not the changes made make any improvement or not.</p>
<p>Firstly this is a waste of time that could be better spent developing other queries. Second it creates an incorrect impression that the queries have been optimised when in fact nothing of the sort has been done.</p>
<p>The second aspect when, with a production database that is performing badly, queries are modified almost at random in an attempt to fix the performance problem quickly.</p>
<p>This almost never works. It wastes time fixing stuff that very likely isn&#8217;t broken in the first place all the while the database performance deteriorates and management curses SQL Server as &#8216;nonscalable&#8217;</p>
<p>So, what is the right approach for the above two scenarios?</p>
<ol>
<li>Don&#8217;t optimise queries without knowing if they need it.</li>
<li>Don&#8217;t optimise queries without knowing if they need it. <sup>1</sup></li>
</ol>
<h3>New development</h3>
<p>When writing queries and stored procedures they need to be tested against a representative data set on a server with representative workload and their performance characteristics evaluated to see if they are acceptable. If the query&#8217;s performance characteristics are acceptable, then that query requires no optimisation<sup>2</sup></p>
<p>This doesn&#8217;t mean write bad code and push it to production. It means write good, solid code, following accepted coding standards, ensure that it runs acceptably against production-volumes of data, and do not spend hours or days trying to get it running a couple of milliseconds faster.</p>
<p>And if the query doesn&#8217;t perform acceptable, identify the problematic portion and fix that, don&#8217;t flail around rewriting bits of the query in the hope that the problem will magically go away.</p>
<p>The execution plan is the primary tool here, along with the output of Statistics IO.</p>
<h3>Fixing existing code</h3>
<p>When evaluating existing databases with know performance problems, limit the performance tuning to queries that really are performing badly and need optimisation. It&#8217;s often true that fixing the top 5-10 worst performing queries will have massive effects in overall system performance, far more than tuning twice that number of queries that aren&#8217;t really a problem.</p>
<p>The best tool for finding which queries really are the worst offenders is SQL Trace.</p>
<p>When looking at queries that are a problem, identify the portions that are inefficient and target attempts at optimisation towards those problems.</p>
<h3>In conclusion</h3>
<p>Measure Twice.<br />
Optimise if necessary.</p>
<hr />
<p>(1) No, that wasn&#8217;t a typo.</p>
<p>(2) At that time. Later changes to schema or data volume may require existing queries to be revised.</p>
<p>For more details on exactly how to identify problematic queries, refer to the <a href="http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-1/">series I wrote at Simple Talk</a> last year.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/03/11/the-root-of-all-evil/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The most optimal join type</title>
		<link>http://sqlinthewild.co.za/index.php/2009/11/24/the-most-optimal-join-type/</link>
		<comments>http://sqlinthewild.co.za/index.php/2009/11/24/the-most-optimal-join-type/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 15:00:46 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=483</guid>
		<description><![CDATA[What&#8217;s the best join type for a query? Should we aspire to seeing nested loop joins in all our queries? Should we tremble with horror at the sight of a hash join? Well, it depends. There&#8217;s no single join type that&#8217;s best in every scenario and there&#8217;s no join type that&#8217;s bad to have in [...]]]></description>
			<content:encoded><![CDATA[<p>What&#8217;s the best join type for a query? Should we aspire to seeing nested loop joins in all our queries? Should we tremble with horror at the sight of a hash join?</p>
<p>Well, it depends. <img src='http://sqlinthewild.co.za/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>There&#8217;s no single join type that&#8217;s best in every scenario and there&#8217;s no join type that&#8217;s bad to have in every scenario. If one of the join types, say the much maligned hash join, was very much a sub-optimal join type in every single scenario, then there would be no reason for it to be in the product and no reason for the optimiser to ever select it for a plan. Since there are three join types, and the optimiser can and does use all three, we must assume that they are all useful under some circumstances.</p>
<p>I took a <a href="http://sqlinthewild.co.za/index.php/2007/12/30/execution-plan-operations-joins/">look at the joins</a> a while back, but it&#8217;s worth revisiting.</p>
<h3>The nested loop join</h3>
<p>A nested loop join is an optimal join type when two conditions are true.</p>
<ol>
<li>One of the resultsets contains quite a small number of rows.</li>
<li>The other table has an index on the join column(s).</li>
</ol>
<p>When both of these are true, SQL can do a very efficient nested loop. The smaller resultset becomes the outer table of the join, a loop runs across all the rows in that resultset and index seeks are done to look up the matching rows in the inner table. It&#8217;s important to note that the number of seeks against the inner table will not be less than the number of rows in the outer table, at the point the join occurs</p>
<p>If the one resultset has a small number of rows but there is no index on the other table on the join column, then a loop join can still be done, but is less optimal as the entire of the inner table (or a subset based on another filter condition) must be read on each iteration of the loop.</p>
<p>If both resultsets have large numbers of rows but there is an index on the join columns in one of the tables then the nested loop can still read through one of the resultsets and do index seeks to locate matching rows, but the number of rows in the outer table will mean lots and lots of seek operations, which may result in a sub-optimal plan.</p>
<p><span id="more-483"></span></p>
<p>If both resultsets have large numbers of rows and there are no indexes on the join column(s) then the nested loop becomes a very sub-optimal join as the large number of rows means that the inner table will have to be read many, many times and the lack of indexes means that whichever table is picked as the inner table will have to be read in its entirety each time.</p>
<p>Bottom line: Forcing a nested loop join on queries that process larger numbers of rows can result in very poor performance, especially if the join column(s) are not indexed.</p>
<p>So let&#8217;s have a look at a couple examples. Table creation code</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE JoinTable1 (
  ID INT IDENTITY PRIMARY KEY,
  SomeString VARCHAR(4),
  RandomDate DATETIME
)

CREATE TABLE JoinTable2 (
  JoinString VARCHAR(4) PRIMARY KEY,
  Filler CHAR(100)
)

CREATE INDEX idx_Test ON JoinTable1 (SomeString, ID)</pre>
<p>I populated these with RedGate&#8217;s Data Generator, so I have no generation code. Shouldn&#8217;t be hard to create, 1 million rows in the 1st table, matching distinct values of the string column (512 of them in my case) in the 2nd table</p>
<p>If I join the two and have a filter that limits the resultset to a small number of rows, I get a loop join</p>
<pre class="brush: sql; title: ; notranslate">SELECT SomeString
  FROM jointable1 j1 INNER JOIN
    jointable2 j2 ON j1.somestring = j2.joinstring
  WHERE ID BETWEEN 0 AND 25</pre>
<p>Table &#8216;JoinTable2&#8242;. Scan count 0, logical reads 50, physical reads 0.<br />
Table &#8216;JoinTable1&#8242;. Scan count 1, logical reads 3, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 0 ms,  elapsed time = 0 ms.</p>
<p>Great, now let&#8217;s up the rows involved.</p>
<pre class="brush: sql; title: ; notranslate">SELECT SomeString
  FROM jointable1 j1
    INNER JOIN jointable2 j2 ON j1.somestring = j2.joinstring
  WHERE ID BETWEEN 0 AND 5000</pre>
<p>This does a merge join by default (there are indexes that can be used to retrieve rows sorted on both sides). As a merge join the stats look like this:</p>
<p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;JoinTable1&#8242;. Scan count 1, logical reads 21, physical reads 0.<br />
Table &#8216;JoinTable2&#8242;. Scan count 1, logical reads 25, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 15 ms,  elapsed time = 209 ms.</p>
<p>Hmm… much higher duration than the first case. What if I force a loop join?</p>
<pre class="brush: sql; title: ; notranslate">SELECT SomeString
  FROM jointable1 j1
    INNER JOIN jointable2 j2 ON j1.somestring = j2.joinstring
  WHERE ID BETWEEN 0 AND 5000
  OPTION (LOOP JOIN)</pre>
<p>Table &#8216;JoinTable2&#8242;. Scan count 0, logical reads 10321, physical reads 0.<br />
Table &#8216;JoinTable1&#8242;. Scan count 1, logical reads 21, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 47 ms,  elapsed time = 220 ms.</p>
<p>Loop is does not appear to be the optimal join type any more.</p>
<h3>The merge join</h3>
<p>The merge join requires that both resultsets are sorted by the join columns. It’s a join type that scales well with the size of the result sets. The cost of the join comes when one or both resultsets have to be sorted before the join can be done. When both resultsets are already sorted by the join key (probably because of the index used to retrieve the data) then the merge join is quite an optimal join regardless of the size of the rows. When the smaller of the two resultsets needs to be sorted before the join a merge can still be relatively efficient. When both resultsets need sorting it is likely that a different join will be more optimal.</p>
<p>Let&#8217;s start with the same example as from the first one (with even more rows, in fact, all the rows)</p>
<pre class="brush: sql; title: ; notranslate">SELECT ID, SomeString
  FROM jointable1 j1
    INNER JOIN jointable2 j2 ON j1.somestring = j2.joinstring</pre>
<p>This by default results in a merge join, a scan of the nonclustered index on the large table and of the cluster on the smaller table.</p>
<p>Table &#8216;JoinTable1&#8242;. Scan count 1, logical reads 2109, physical reads 0.<br />
Table &#8216;JoinTable2&#8242;. Scan count 1, logical reads 19, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 1201 ms,  elapsed time = 11458 ms.</p>
<p>Most of the elapsed time is the time required to display the results. If I now go and change the order of columns in the nonclustered index, it can no longer be used to get the data in the correct order for the join. The index is still the same size and should still have the same scan cost. We now get the semi-expected hash join.</p>
<p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;JoinTable1&#8242;. Scan count 1, logical reads 2109, physical reads 0.<br />
Table &#8216;JoinTable2&#8242;. Scan count 1, logical reads 25, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 1685 ms,  elapsed time = 11726 ms.</p>
<p>Ok, now what if I forced it back to a merge join? SQL would have to first sort a 1 million row table, then do the join.</p>
<p>Table &#8216;JoinTable1&#8242;. Scan count 1, logical reads 2109, physical reads 0.<br />
Table &#8216;JoinTable2&#8242;. Scan count 1, logical reads 19, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 6661 ms,  elapsed time = 21944 ms.</p>
<p>IOs are unchanged, it&#8217;s still just a single scan of each involved index, but look at that CPU cost.</p>
<p>One additional point about Merge join is that there must be at lest one equi-join present. If the only join conditions between two tables are inequalities than merge is not a valid join type and an attempt to force a merge join will result in an error.</p>
<h3>The hash join</h3>
<p>This join type seems to be the one most hated. So often people are horrified by the appearance of a hash join in a query plan. “Should I force a loop join?” is asked.</p>
<p>Well, probably not.</p>
<p>A hash join is the heavy lifter of the join types. It’s the one that operates efficiently on huge, unsorted resultsets and it’s the one that parallels best. Remember that a loop join is efficient if one of the resultsets has a small number of rows and merge joins are efficient if both resultsets are already sorted on the join columns(s). When neither of those joins prove optimal, that’s when the hash join is used.</p>
<p>Sometimes the presence of a hash join suggests that there are missing indexes or that more rows are being processed than necessary. Other times it simply means that the numbers of rows involved in the query rules out the other two join types.</p>
<p>I&#8217;m not going to do any examples here because the hash join is the one join type that doesn&#8217;t perform terribly if called on smaller resultsets. It still works, and fairly well, it&#8217;s just that the other joins may be better in those cases.</p>
<p>Like with the merge join, the hash requires at least one equi-join to work and an attempt to force a hash join when there are only inequality join predicates will result in an error.</p>
<p>So next time someone asks &#8216;Is a hash join bad?&#8217;, the correct answer to give them is &#8216;It depends, how many rows is it joining?&#8217;</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2009/11/24/the-most-optimal-join-type/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Multiple Execution Paths</title>
		<link>http://sqlinthewild.co.za/index.php/2009/09/15/multiple-execution-paths/</link>
		<comments>http://sqlinthewild.co.za/index.php/2009/09/15/multiple-execution-paths/#comments</comments>
		<pubDate>Tue, 15 Sep 2009 10:00:14 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=284</guid>
		<description><![CDATA[It&#8217;s not uncommon to find stored procedures that have multiple IF statements controlling the flow of execution within the procedure. Now this seems to be a fairly logical thing to do, but there can be a subtle performance problem with this, one that may be hard to identify. Let&#8217;s have a look at a simple [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s not uncommon to find stored procedures that have multiple IF statements controlling the flow of execution within the procedure. Now this seems to be a fairly logical thing to do, but there can be a subtle performance problem with this, one that may be hard to identify.</p>
<p>Let&#8217;s have a look at a simple example (using AdventureWorks)</p>
<pre class="brush: sql; title: ; notranslate">CREATE PROCEDURE MultipleExecPaths (
@TransactionType char(1) = NULL
)
AS

IF @TransactionType IS NULL
SELECT max(transactionDate) from Production.TransactionHistory
ELSE
SELECT max(transactionDate) from Production.TransactionHistory
WHERE TransactionType = @TransactionType

GO</pre>
<p>Nice and simple. If the parameter is passed, get the latest date for that transaction type, if the parameter is not passed, ie is null, get the latest date over all transaction types. So what&#8217;s wrong with this?</p>
<p>The problem goes back to parameter sniffing. When the procedure is first executed the first time all queries in the procedure are parsed, bound and optimised. When the optimiser processes each statement to generate an execution plan it uses the values passed for the various parameters to estimate the number of rows affected. The number of rows that the optimiser thinks the queries will process affects the choice of operators for the plan. Operators that are optimal for small numbers of rows are not always optimal for large numbers of rows, and sometimes the difference can be astounding.</p>
<p>Let&#8217;s see how the example above plays out  to understand what&#8217;s happening here.</p>
<p><span id="more-284"></span>First thing I want to do is to run the two queries in that procedure separately to see how they behave in isolation. After that I&#8217;ll run the procedure and see how it behaves, both when the first call passes the parameter and when the first call doesn&#8217;t pass a parameter.</p>
<pre class="brush: sql; title: ; notranslate">SELECT max(transactionDate) from Production.TransactionHistory
GO
SELECT max(transactionDate) from Production.TransactionHistory
WHERE TransactionType = 'W'</pre>
<p>I&#8217;m hardcoding the parameter value so that we get much the same affect as we would with a parameter in a stored proc. If I used a variable, SQL wouldn&#8217;t be able to see the value of the variable at compile time and hence we wouldn&#8217;t necessarily get the same plan as we would in the procedure.</p>
<p>The first query, the one without a where clause executes with a clustered index scan. This isn&#8217;t surprising, there&#8217;s no index on the TransactionDate column. Execution statistics are</p>
<blockquote><p>Table &#8216;TransactionHistory&#8217;. Scan count 1, logical reads 3165, physical reads 0.  SQL Server Execution Times:   CPU time = 125 ms,  elapsed time = 131 ms.</p></blockquote>
<p>The second query, with  the filter on TransactionType also executes with a clustered index scan, even though there&#8217;s an index on TransactionType. Execution statistics are</p>
<blockquote><p>Table &#8216;TransactionHistory&#8217;. Scan count 1, logical reads 3165, physical reads 0.  SQL Server Execution Times:   CPU time = 110 ms,  elapsed time = 111 ms.</p></blockquote>
<p>Note that I have more data in my copy of AdventureWorks than is normal, so the execution times and IO statistics are higher than they would be with a normal copy of AW.</p>
<p>Why the clustered index scan when there&#8217;s an index? The index on TransactionType is not covering, it doesn&#8217;t have the TransactionDate column in it, and the filter on TransactionType returns 125000 rows out of the total of 450000 in the table. That&#8217;s 27% of the table, far too high for the optimiser to consider a seek on a non-covering index and a whole load of key lookups.</p>
<p>So when run separately the two queries both execute in around 100ms and do just over 3000 reads. Now let&#8217;s see how they behave together in the stored procedure.</p>
<p>First test is going to be calling the procedure first time with a parameter</p>
<pre class="brush: sql; title: ; notranslate">EXEC MultipleExecPaths @TransactionType = 'W'
GO

EXEC MultipleExecPaths</pre>
<p>First Procedure call</p>
<blockquote><p>Table &#8216;TransactionHistory&#8217;. Scan count 1, logical reads 3165, physical reads 0.   SQL Server Execution Times:  CPU time = 109 ms,  elapsed time = 117 ms.</p></blockquote>
<p>Second procedure call</p>
<blockquote><p>Table &#8216;TransactionHistory&#8217;. Scan count 1, logical reads 3165, physical reads 0.   SQL Server Execution Times:  CPU time = 109 ms,  elapsed time = 111 ms.</p></blockquote>
<p>This looks fine. The execution statistics and execution plan are the same as in the case where I ran the two statements. Next test, I&#8217;m going to clear the procedure cache out and run the two procedures again, this time in the other order.</p>
<pre class="brush: sql; title: ; notranslate">DBCC FREEPROCCACHE
GO

EXEC MultipleExecPaths
GO

EXEC MultipleExecPaths @TransactionType = 'W'</pre>
<p>First Procedure call:</p>
<blockquote><p>Table &#8216;TransactionHistory&#8217;. Scan count 1, logical reads 3165, physical reads 0.   SQL Server Execution Times:  CPU time = 109 ms,  elapsed time = 111 ms.</p></blockquote>
<p>Second procedure call:</p>
<blockquote><p>Table &#8216;TransactionHistory&#8217;. Scan count 1, logical reads 372377, physical reads 0.   SQL Server Execution Times:  CPU time = 265 ms,  elapsed time = 266 ms.</p></blockquote>
<p>Oops. What happened here? The reads have gone through the roof and the query duration has more than doubled. Let&#8217;s have a look at the execution plan, see if there&#8217;s a hint there.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2009/09/MultipleExecPaths1.png"><img class="alignnone size-medium wp-image-314" style="border: 1px solid black;" title="Multiple Execution Paths" src="http://sqlinthewild.co.za/wp-content/uploads/2009/09/MultipleExecPaths1-300x80.png" alt="Multiple Execution Paths" width="300" height="80" /></a></p>
<p>Remember I said earlier that since the filter on transaction type affected a large percentage of the table, the optimiser wouldn&#8217;t chose an index seek with a key lookup? Well, that&#8217;s exactly what it&#8217;s done here. Question is, why. There&#8217;s a partial answer in the properties of the index seek</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2009/09/MultipleExecPaths2.png"><img class="alignnone size-medium wp-image-315" title="Multiple Execution Paths" src="http://sqlinthewild.co.za/wp-content/uploads/2009/09/MultipleExecPaths2-273x300.png" alt="Multiple Execution Paths" width="273" height="300" /></a></p>
<p>Estimated rows 1. Actual rows 124008. Yeah, that&#8217;s going to mess up the choice of plan. A quick dig into the xml plan gives a very large clue as to what&#8217;s going on here.</p>
<pre class="brush: xml; title: ; notranslate">&lt;ParameterList&gt;
&lt;ColumnReference Column=&quot;@TransactionType&quot; ParameterCompiledValue=&quot;NULL&quot; ParameterRuntimeValue=&quot;'W'&quot; /&gt;
&lt;/ParameterList&gt;</pre>
<p>ParameterCompiledValue=&#8221;NULL&#8221;,  ParameterRuntimeValue=&#8221;&#8216;W&#8217;&#8221;. Since no rows will ever satisfy the condition TransactionType = NULL (assuming default ansi_null settings), the optimiser compiled the plan for 1 row.</p>
<p>What&#8217;s happened here is that when the procedure was run the first time and the optimiser generated the execution plan, it optimised all of the queries in the procedure based on the values of the parameters passed for that execution, regardless of whether the query could be executed with that particular set of parameters. So in this case, when the procedure was first executed and the execution plan was generated, it was generated for both queries based on a parameter value of NULL, even though the second branch of the IF could not be reached with that parameter value.</p>
<p>The important thing to take away from this is that when a procedure is compiled, the entire thing is compiled and all queries in it are optimised based on the parameter values for that call. If some queries will only be executed for certain parameter values then it can be that those queries will get very sub-optimal plans.</p>
<p>Great. We know the cause of the problem. What&#8217;s the fix?</p>
<p>Well, there are several options. The usual fixes for parameter sniffing can work here, the use of the OPTION (RECOMPILE) or OPTION (OPTIMISE FOR&#8230;) hints are certainly useful. If a query has OPTION(RECOMPILE) then its plan is never cached and so will always be compiled with an appropriate parameter value. OPTIMISE FOR can be used to override the optimiser&#8217;s parameter sniffing, regardless of what the actual parameter value is, the optimiser will use the hinted value instead.</p>
<p>There&#8217;s also another solution for this problem, one I personally favour. Sub-procedures. Because a stored procedure is only compiled when it&#8217;s executed, moving the contents of the branches of the IF statement into separate procedures and calling those procedures from the branches of the IF completely eliminate the chance of an inappropriate plan caused by this problem, and also prevents the optimiser from doing unnecessary work (optimising the entire procedure when only part will be run). So using this solution, the original procedure could be modified to this.</p>
<pre class="brush: sql; title: ; notranslate">CREATE PROCEDURE MaxDateNoFilter AS
SELECT max(transactionDate) from Production.TransactionHistory
GO

CREATE PROCEDURE MaxDateWithFilter (
@TransactionType char(1) -- doesn't need null default, because will not be called with null
)
AS
SELECT max(transactionDate) from Production.TransactionHistory
WHERE TransactionType = @TransactionType
GO

CREATE PROCEDURE MultipleExecPaths (
@TransactionType char(1) = NULL
)
AS

IF @TransactionType IS NULL
EXEC MaxDateNoFilter
ELSE
EXEC MaxDateWithFilter @TransactionType

GO</pre>
<p>Now each sub-procedure gets it&#8217;s own separate cached plan and no matter how the outer procedure is called the first time, the plans will be optimal for the parameters that the sub-procedures are actually executed with.</p>
<p>Edit: If you&#8217;re trying to reproduce my results, make sure there&#8217;s an index on TransactionType. Without that, all queries execute with a tablescan.</p>
<pre class="brush: sql; title: ; notranslate">CREATE NONCLUSTERED INDEX [IX_TransactionHistory_TransactionType]
ON [Production].[TransactionHistory] ([TransactionType] ASC)</pre>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2009/09/15/multiple-execution-paths/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
	</channel>
</rss>

