<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SQL in the Wild &#187; T-SQL</title>
	<atom:link href="http://sqlinthewild.co.za/index.php/category/sql-server/t-sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://sqlinthewild.co.za</link>
	<description>A discussion on SQL Server</description>
	<lastBuildDate>Wed, 25 Apr 2012 14:45:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Goodbye IsNumeric hell</title>
		<link>http://sqlinthewild.co.za/index.php/2011/07/26/goodbye-isnumeric-hell/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/07/26/goodbye-isnumeric-hell/#comments</comments>
		<pubDate>Tue, 26 Jul 2011 14:30:00 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=1223</guid>
		<description><![CDATA[A well overdue feature introduced in Denali CTP 3 is that of the Try_Parse and Try_Convert functions. These are great for dealing with the something that’s frustrated SQL developers for years – data type conversions. Let’s imagine a rather nasty case, a file with some values in it that once imported into SQL (as character [...]]]></description>
			<content:encoded><![CDATA[<p>A well overdue feature introduced in Denali CTP 3 is that of the Try_Parse and Try_Convert functions. These are great for dealing with the something that’s frustrated SQL developers for years – data type conversions.</p>
<p>Let’s imagine a rather nasty case, a file with some values in it that once imported into SQL (as character data) looks something like this:</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/07/BadNumerics.png"><img style="display: inline; border-width: 0px;" title="BadNumerics" src="http://sqlinthewild.co.za/wp-content/uploads/2011/07/BadNumerics_thumb.png" border="0" alt="BadNumerics" width="128" height="145" /></a></p>
<p>Ewww… that’s a mess. Let’s see if we can identify which of the values can be converted into a numeric data type. The function prior to Denali for that was ISNUMERIC.</p>
<pre class="brush: sql; title: ; notranslate">SELECT ANumber, ISNUMERIC(ANumber)
FROM BadNumerics   </pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/07/IsNumeric.png"><img style="display: inline; border-width: 0px;" title="IsNumeric" src="http://sqlinthewild.co.za/wp-content/uploads/2011/07/IsNumeric_thumb.png" border="0" alt="IsNumeric" width="226" height="143" /></a></p>
<p>Great, so other than the obvious one, they’re all numeric data. Time to get converting.</p>
<pre class="brush: sql; title: ; notranslate">SELECT CAST(ANumber as Numeric(18,6))
  FROM BadNumerics
  WHERE ISNUMERIC(ANumber) = 1;</pre>
<blockquote><p><span style="color: #ff0000;">Msg 8114, Level 16, State 5, Line 1<br />
Error converting data type varchar to numeric.</span></p></blockquote>
<p>Err, so they’re numeric but can’t be converted to numeric. That’s fun. Maybe another of the numeric data types will work…</p>
<p><span id="more-1223"></span></p>
<pre class="brush: sql; title: ; notranslate">SELECT CAST(ANumber as FLOAT)
  FROM BadNumerics
  WHERE ISNUMERIC(ANumber) = 1;</pre>
<blockquote><p><span style="color: #ff0000;">Msg 8114, Level 16, State 5, Line 1<br />
Error converting data type varchar to float.</span></p></blockquote>
<p>Not that one either. Money perhaps, it’s usually more forgiving than float or numeric.</p>
<pre class="brush: sql; title: ; notranslate">SELECT CAST(ANumber as MONEY)
  FROM BadNumerics
  WHERE ISNUMERIC(ANumber) = 1;</pre>
<blockquote><p><span style="color: #ff0000;">Msg 235, Level 16, State 0, Line 1<br />
Cannot convert a char value to money. The char value has incorrect syntax.</span></p></blockquote>
<p>Different error, same problem. The issue here is that there’s no single data type that all of those messy values can convert to. 23,90 can convert to money, not float or numeric. 2.34e-3 and 2d6 both convert fine to float, not to money or numeric.</p>
<p>So ISNUMERIC is a bit of a misleading function. All it tells you is that the value can be cast to one of the numeric data types in SQL, it’s up to you to figure out which one.</p>
<p>Pre-Denali a data set like that would be a terrible mess to get converted (though I’d probably just send it back to where it came from and ask for it to be cleaned at the source). In Denali though, it’s not all that bad.</p>
<pre class="brush: sql; title: ; notranslate">SELECT ANumber, TRY_CONVERT(NUMERIC(18,6), ANumber)
  FROM BadNumerics;</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/07/TryConvert1.png"><img style="display: inline; border-width: 0px;" title="TryConvert1" src="http://sqlinthewild.co.za/wp-content/uploads/2011/07/TryConvert1_thumb.png" border="0" alt="TryConvert1" width="229" height="141" /></a></p>
<p>Now that’s more like it… We can do the same with the other numeric data types</p>
<pre class="brush: sql; title: ; notranslate">SELECT ANumber,
    TRY_CONVERT(NUMERIC(18,6), ANumber) as TryNumeric,
    TRY_CONVERT(FLOAT, ANumber) AS TryFloat,
    TRY_CONVERT(MONEY, ANumber) AS TryMoney
  FROM BadNumerics;</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/07/TryNumeric2.png"><img style="display: inline; border-width: 0px;" title="TryNumeric2" src="http://sqlinthewild.co.za/wp-content/uploads/2011/07/TryNumeric2_thumb.png" border="0" alt="TryNumeric2" width="283" height="130" /></a></p>
<p>Other than the one that’s clearly not a number of any form, the rest convert to at least one of the data types, and now I have an error-free way to convert everything that converts and ignore the rest without having to write nasty validation functions of my own.</p>
<pre class="brush: sql; title: ; notranslate">WITH ConvertAMess (Original, ConvertedNumber) AS (
  SELECT ANumber,
      COALESCE(TRY_CONVERT(NUMERIC(18,6), ANumber),TRY_CONVERT(FLOAT, ANumber),TRY_CONVERT(MONEY, ANumber))
    FROM BadNumerics
)
SELECT  Original, ConvertedNumber
  FROM ConvertAMess
  WHERE ConvertedNumber IS NOT NULL;</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/07/UltimateTry.png"><img style="display: inline; border-width: 0px;" title="UltimateTry" src="http://sqlinthewild.co.za/wp-content/uploads/2011/07/UltimateTry_thumb.png" border="0" alt="UltimateTry" width="201" height="121" /></a></p>
<p>Not perfect, but a darn side better than what we started with. That’s going to simplify the kind of data-cleansing imports that I’ve been seeing recently.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/07/26/goodbye-isnumeric-hell/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Converting OR to Union</title>
		<link>http://sqlinthewild.co.za/index.php/2011/07/05/converting-or-to-union/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/07/05/converting-or-to-union/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 15:30:00 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=963</guid>
		<description><![CDATA[When I looked at indexing for queries containing predicates combined with OR, it became clear that the are some restrictive requirements for indexes for the optimiser to consider using the indexes for seek operations. Each predicate (or set of predicates) combined with an OR must have a separate index All of those indexes must be [...]]]></description>
			<content:encoded><![CDATA[<p>When I looked at <a href="http://sqlinthewild.co.za/index.php/2011/05/03/indexing-for-ors/">indexing for queries containing predicates combined with OR</a>, it became clear that the are some restrictive requirements for indexes for the optimiser to consider using the indexes for seek operations.</p>
<ul>
<li>Each predicate (or set of predicates) combined with an OR must have a separate index</li>
<li> All of those indexes must be covering, or the row count of the concatenated result set low enough to make key lookups an option, as the optimiser does not apparent to consider the possibility of doing key lookups for a subset of the predicates before concatenating the result sets.</li>
</ul>
<p>So what can be done if it&#8217;s not possible to meet those requirements?</p>
<p>The standard trick is to convert the query with ORs into multiple queries combined with UNION. The idea is that since OR predicates are evaluated separately and the result sets concatenated, we can do that manually by writing the queries separately and concatenating them using UNION or UNION ALL. (UNION ALL can only be safely used if the predicates are known to be mutually exclusive)</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE Persons (
PersonID INT IDENTITY PRIMARY KEY,
FirstName    VARCHAR(30),
Surname VARCHAR(30),
Country CHAR(3),
RegistrationDate DATE
)

CREATE INDEX idx_Persons_FirstName ON dbo.Persons (FirstName) INCLUDE (Surname)
CREATE INDEX idx_Persons_Surname ON dbo.Persons (Surname) INCLUDE (FirstName)
GO

-- Data population using SQLDataGenerator

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Daniel' OR Surname = 'Barnes'

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Daniel'
UNION
SELECT FirstName, Surname
FROM dbo.Persons
WHERE Surname = 'Barnes'</pre>
<p>In this case, the OR can be replaced with a UNION and the results are the same. The Union form is slightly less efficient according to the execution plan&#8217;s costings (60% compared to the OR at 40%), and the two queries have the same general form, with two index seeks and some form of concatenation and remove duplicates.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrResult1.png"><img style="display: inline; border-width: 0px;" title="OrResult1" src="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrResult1_thumb.png" border="0" alt="OrResult1" width="124" height="320" /></a><br />
<a href="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrExecPlan1.png"><img style="display: inline; border-width: 0px;" title="OrExecPlan1" src="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrExecPlan1_thumb.png" border="0" alt="OrExecPlan1" width="484" height="298" /></a></p>
<p>So in that case it worked fine, although the original form was a little more efficient<br />
<span id="more-963"></span><br />
Some care does need to be taken however, as the query with OR and the query with UNION may not always be equivalent, and it has to do with the elimination of duplicate rows.</p>
<p>In an OR, if a row qualifies for both of the predicates, it&#8217;s only returned once. That should be obvious, it&#8217;s how things should work, we don&#8217;t want to see the row multiple times just because it qualifies for more than one of the OR predicates. If we change that to UNION ALL then the row will be returned twice, it appears in both queries that are concatenated, and UNION ALL means combine without eliminating duplicates.</p>
<pre class="brush: sql; title: ; notranslate">SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Herman' OR Surname = 'Anderson'

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Herman'
UNION ALL
SELECT FirstName, Surname
FROM dbo.Persons
WHERE Surname = 'Anderson'</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrResult2a.png"><img style="display: inline; border-width: 0px;" title="OrResult2a" src="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrResult2a_thumb.png" border="0" alt="OrResult2a" width="124" height="292" /></a></p>
<p>In that example, Herman Anderson appears once in the results of the OR query and twice in the results of the UNION ALL. That&#8217;s because it qualifies for both predicates. The OR eliminated the duplication, the UNION ALL does not.</p>
<p>So change that UNION ALL to UNION so that the elimination of duplicate rows is done, the row appears only once and life is good again. Or is it?</p>
<pre class="brush: sql; title: ; notranslate">SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Alfred' OR Surname = 'Hickman'

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Alfred'
UNION
SELECT FirstName, Surname
FROM dbo.Persons
WHERE Surname = 'Hickman'
ORDER BY FirstName, Surname</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrResult2b.png"><img style="display: inline; border-width: 0px;" title="OrResult2b" src="http://sqlinthewild.co.za/wp-content/uploads/2011/06/OrResult2b_thumb.png" border="0" alt="OrResult2b" width="124" height="307" /></a></p>
<p>This time, Alfred Hickman appears twice in the results from the OR, but only once in the output from the UNION</p>
<p>The difference comes in how the duplicates are eliminated. With an OR, SQL does the elimination of duplicates based on the key value regardless of what may be in the select list. With an UNION, SQL does the elimination of duplicates based on the select list, regardless of what the key value may be and in the above example there were two rows in the table with the value ‘Alfred Hickman’. So with UNION you can lose rows if they are duplicated in the table.</p>
<p>The solution&#8217;s fairly simple, if converting an OR into a UNION, ensure that the key column(s) are in the select list, then the duplicate elimination done by the UNION will only remove rows that were part of both result sets, instead of also removing ones that really do appear twice in the table.</p>
<p>So in conclusion, if you&#8217;re replacing a query using OR with a query using UNION, be careful with the finer details around duplicates. If you know the conditions are mutually exclusive, use UNION ALL. If you don&#8217;t, use UNION and ensure that the table&#8217;s key column(s) are present in the select list so that the UNION doesn&#8217;t remove rows that you don&#8217;t want it to remove.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/07/05/converting-or-to-union/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>On Transactions, errors and rollbacks</title>
		<link>http://sqlinthewild.co.za/index.php/2011/05/17/on-transactions-errors-and-rollbacks/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/05/17/on-transactions-errors-and-rollbacks/#comments</comments>
		<pubDate>Tue, 17 May 2011 14:30:15 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=1006</guid>
		<description><![CDATA[Do errors encountered within a transaction result in a rollback? It seems, at first, to be a simple question with an obvious answer. Transactions are supposed to be atomic, either the entire transaction completes or none of it completes. Maybe too simple… If a transaction rolled back at the first failure, that final select would [...]]]></description>
			<content:encoded><![CDATA[<p>Do errors encountered within a transaction result in a rollback?</p>
<p>It seems, at first, to be a simple question with an obvious answer. Transactions are supposed to be atomic, either the entire transaction completes or none of it completes.</p>
<p>Maybe too simple…</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE TestingTransactionRollbacks (
 ID INT NOT NULL PRIMARY KEY ,
 SomeDate DATETIME DEFAULT GETDATE()
 ) ;
GO
BEGIN TRANSACTION
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (1)
-- Fails. Cannot insert null into a non-null column
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (NULL)
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
-- fails. Duplicate key
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (3)
COMMIT TRANSACTION
GO
SELECT ID, SomeDate FROM TestingTransactionRollbacks
GO
DROP TABLE TestingTransactionRollbacks
</pre>
<p>If a transaction rolled back at the first failure, that final select would return no rows. But it doesn&#8217;t, it returns 3 rows. The failure of the individual statements was ignored and the transaction completed and committed. If that had been an important business process, not a made-up example, that could have some nasty consequences for transactional consistency of data.</p>
<p>What&#8217;s really going on here? Aren&#8217;t transactions supposed to be atomic? Isn&#8217;t SQL supposed to roll them back if they don&#8217;t complete successfully?</p>
<p>Well, kinda.</p>
<p><span id="more-1006"></span></p>
<p>Books Online <a href="http://msdn.microsoft.com/en-us/library/ms174377%28v=SQL.105%29.aspx">states</a></p>
<blockquote><p>A transaction is a single unit of work. If a transaction is successful, all of the data modifications made during the transaction are committed and become a permanent part of the database. If a transaction encounters errors and must be canceled or rolled back, then all of the data modifications are erased.</p></blockquote>
<p>That suggests that indeed the transaction should roll back automatically, however it also <a href="http://msdn.microsoft.com/en-us/library/ms175523.aspx">states</a></p>
<blockquote><p>If the client&#8217;s network connection to an instance of the Database Engine is broken, any outstanding transactions for the connection are rolled back when the network notifies the instance of the break.</p>
<p>If a run-time statement error (such as a constraint violation) occurs in a batch, the default behavior in the Database Engine is to roll back only the statement that generated the error.</p></blockquote>
<p>The default behaviour is to roll back only the statement that generated the error. Not the entire transaction.</p>
<p>A transaction will be rolled back if the connection closes (network error, client disconnect, high-severity error) and the commit was not reached. A transaction will be rolled back if the SQL Server terminates (shutdown, power failure, unexpected termination) and the commit was not reached. Under default settings, a non-fatal error thrown by a statement within a transaction will not automatically cause a rollback. (fatal = severity 19 and above)</p>
<p>So what can we do if we do want a transaction to completely roll back if any error is encountered during the execution?</p>
<p>There are two option.<br />
1) Use the Xact_Abort setting<br />
2) Catch and handle the error, and specify a rollback within the error handling</p>
<h3>Xact_Abort</h3>
<p>From Books Online:</p>
<blockquote><p>When SET XACT_ABORT is ON, if a Transact-SQL statement raises a run-time error, the entire transaction is terminated and rolled back.</p>
<p>When SET XACT_ABORT is OFF, in some cases only the Transact-SQL statement that raised the error is rolled back and the transaction continues processing. Depending upon the severity of the error, the entire transaction may be rolled back even when SET XACT_ABORT is OFF. OFF is the default setting.</p></blockquote>
<p>Sounds simple enough. Let&#8217;s try the example from above with Xact_Abort on.</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE TestingTransactionRollbacks (
 ID INT NOT NULL PRIMARY KEY ,
 SomeDate DATETIME DEFAULT GETDATE()
 ) ;
GO
SET XACT_ABORT ON
GO

BEGIN TRANSACTION
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (1)
-- Fails. Cannot insert null into a non-null column
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (NULL)
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
-- fails. Duplicate key
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (3)
COMMIT TRANSACTION
GO
SELECT ID, SomeDate FROM TestingTransactionRollbacks
GO
DROP TABLE TestingTransactionRollbacks
</pre>
<p>Now the first of the run-time errors results in the entire transaction rolling back.</p>
<p>This is great if all you want is the transaction rolled back if an error occurs and aren&#8217;t interested in any additional error handling or logging.</p>
<h3>Error Handling</h3>
<p>Error handling used to be an absolute pain in SQL 2000. With no automatic error trapping in that version, error handling was limited to checking the value of @@error after each statement and using GOTO.</p>
<p>Fortunately in newer versions of SQL, there&#8217;s the TRY &#8230; CATCH construct. Not quite as fully-functional as the form that many front-end languages have (no finally block, no ability to catch specific classes of exceptions and ignore others) but still far, far better than what we had before.</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE TestingTransactionRollbacks (
 ID INT NOT NULL
 PRIMARY KEY ,
 SomeDate DATETIME DEFAULT GETDATE()
 ) ;
GO

BEGIN TRANSACTION
BEGIN TRY
 -- succeeds
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (1)
 -- Fails. Cannot insert null into a non-null column
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (NULL)
 -- succeeds
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
 -- fails. Duplicate key
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
 -- succeeds
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (3)
 COMMIT TRANSACTION
END TRY
BEGIN CATCH
 ROLLBACK TRANSACTION
END CATCH
GO
SELECT ID, SomeDate FROM TestingTransactionRollbacks
GO
DROP TABLE TestingTransactionRollbacks</pre>
<p>The first exception transfers execution into the Catch block, the transaction is then rolled back and when the select runs there&#8217;s 0 rows in the table.</p>
<p>This looks like it does the same as XactAbort, just with far more typing, but there are advantages to handling the errors rather than just letting SQL roll the transaction back automatically. The catch block is not limited to just rolling back the transaction, it can log to error tables (after the rollback, so that the logging is not rolled back), it can take compensating actions, and it&#8217;s not even required to roll the transaction back (in most cases).</p>
<p>One of the reasons for using a catch block is that there are a number of error-related functions that only return data when they are called from within a catch block. These functions make it possible to create a friendly error and raise that (using raiserror) so that the client application doesn&#8217;t get the default SQL error messages. It&#8217;s also possible to check what error was thrown and behave differently for different errors (though not as easily as in applications like C# which allow catching of exception classes)</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE TestingTransactionRollbacks (
&lt;pre&gt; ID INT NOT NULL
 PRIMARY KEY ,
 SomeDate DATETIME DEFAULT GETDATE()
 ) ;
GO

BEGIN TRANSACTION
BEGIN TRY
 -- succeeds
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (1)
 -- Fails. Cannot insert null into a non-null column
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (NULL)
 -- succeeds
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
 -- fails. Duplicate key
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
 -- succeeds
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (3)
 COMMIT TRANSACTION
END TRY
BEGIN CATCH
  ROLLBACK TRANSACTION
  SELECT  ERROR_NUMBER() AS ErrorNumber, ERROR_SEVERITY() AS Severity, ERROR_MESSAGE() AS ErrorMessage, ERROR_LINE() AS ErrorLine, ERROR_PROCEDURE() AS ErrorProcedure
END CATCH
GO&lt;/pre&gt;
EXEC InsertWithError

GO
DROP TABLE TestingTransactionRollbacks
DROP PROCEDURE InsertWithError</pre>
<p>With those functions, the exact error text can be logged to a table for further analysis, along with the line and the procedure that the error occurred in, and then a friendly error can be sent back to the user.</p>
<p>Just one thing, of course, if using a logging table the insert should be done after the transaction rollback, or temporarily inserted into a table variable so as to not be affected by the rollback.</p>
<p>One last thing that does need mentioning is the concept of a doomed transaction. This is a transaction that, once execution is transferred to the catch block, must be rolled back. The easiest way to see this in action is to combine XactAbort and a Try-Catch block</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE TestingTransactionRollbacks (
 ID INT NOT NULL PRIMARY KEY ,
 SomeDate DATETIME DEFAULT GETDATE()
 ) ;
GO

SET XACT_ABORT ON ;

BEGIN TRANSACTION
BEGIN TRY
 -- succeeds
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (1)
 -- Fails. Cannot insert null into a non-null column
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (NULL)
 -- succeeds
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
 -- fails. Duplicate key
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
 -- succeeds
 INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (3)
 COMMIT TRANSACTION
END TRY
BEGIN CATCH
 COMMIT TRANSACTION
END CATCH
GO
SELECT ID, SomeDate FROM TestingTransactionRollbacks
GO
DROP TABLE TestingTransactionRollbacks</pre>
<p>In this case I&#8217;m ignoring the error and committing anyway. Probably not something that will be done often in real systems, but for just demonstration purposes it&#8217;ll serve.</p>
<p>Running this however returns another error (one thrown in the catch block)</p>
<blockquote><p>Msg 3930, Level 16, State 1, Line 24<br />
The current transaction cannot be committed and cannot support operations that write to the log file. Roll back the transaction.</p></blockquote>
<p>So how do you check for this? The built-in function XactState will tell us the state of the transaction. A value of 1 means that the transaction can be committed, a value of -1 means that the transaction is doomed and can only be rolled back.</p>
<p>Replacing the catch block with the following allows the code to run without error</p>
<pre class="brush: sql; title: ; notranslate">BEGIN CATCH
  IF XACT_STATE() = 1
    COMMIT TRANSACTION
  IF XACT_STATE() = -1
    ROLLBACK TRANSACTION
END CATCH</pre>
<p>Now this is only half the story, as I haven&#8217;t touched on nested transactions at all. That&#8217;s an entire post of its own though.</p>
<p>In conclusion, while SQL does no provide the rich exception handling of front end applications, what it does provide is adequate for good error handling, especially in conjunction with transactions that must commit or roll back as atomic units.</p>
<p>All the error handling in the world however will not help if is not used, and leaving it out and just hoping the code will run correctly every time is never a good development practice.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/05/17/on-transactions-errors-and-rollbacks/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>A Trio of Table Variables</title>
		<link>http://sqlinthewild.co.za/index.php/2010/10/12/a-trio-of-table-variables/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/10/12/a-trio-of-table-variables/#comments</comments>
		<pubDate>Tue, 12 Oct 2010 14:00:48 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=687</guid>
		<description><![CDATA[So, it&#8217;s the second Tuesday of the month again, and it&#8217;s time for T-SQL Tuesday again. This month it&#8217;s hosted by Sankar Reddy and the topic is &#8220;Misconceptions in SQL Server&#8221; I thought I&#8217;d tackle a trio of table variable myths and partial truths. Table Variables are memory-only This one is pervasive and irritating. It [...]]]></description>
			<content:encoded><![CDATA[<p>So, it&#8217;s the second Tuesday of the month again, and it&#8217;s time for T-SQL Tuesday again. <a href="http://sankarreddy.com/2010/10/invitation-to-participate-in-t-sql-tuesday-11-misconceptions-in-sql-server/"><img class="alignright" style="border: 0pt none;" title="TSQL2sDay150x150" src="http://sqlinthewild.co.za/wp-content/uploads/2010/09/TSQL2sDay150x150.jpg" border="0" alt="TSQL2sDay150x150" width="154" height="154" align="right" /></a>This month it&#8217;s hosted by <a href="http://SankarReddy.com/">Sankar Reddy</a> and the topic is &#8220;<em><a href="http://sankarreddy.com/2010/10/invitation-to-participate-in-t-sql-tuesday-11-misconceptions-in-sql-server/">Misconceptions in SQL Server</a></em>&#8221;</p>
<p>I thought I&#8217;d tackle a trio of table variable myths and partial truths.</p>
<h3>Table Variables are memory-only</h3>
<p>This one is pervasive and irritating. It typically goes like this:</p>
<blockquote><p>You should use table variables rather than temp tables because table variables are memory only.</p></blockquote>
<p>This myth can be broken down into two parts:</p>
<ol>
<li>That table variables are not part of TempDB</li>
<li>That table variables are not written to disk</li>
</ol>
<p>The first is easy to prove and has <a href="http://www.texastoo.com/post/2009/11/26/Table-Variables-e28093-still-a-mystery.aspx">been</a> <a href="http://sql-troubles.blogspot.com/2010/08/temporary-tables-vs-table-variables.html">done</a> <a href="http://www.sqlservercentral.com/articles/Temporary+Tables/66720/">repeatedly</a>. I&#8217;m not doing it again. I&#8217;m going to tackle the second portion only.</p>
<p>See, one could argue that, even though the table variable is created in the TempDB system tables and allocated pages within the TempDB data file, it is still kept entirely and only in memory. Let&#8217;s see if that&#8217;s true…</p>
<pre class="brush: sql; title: ; notranslate">DECLARE @LargeTable TABLE (
id INT IDENTITY PRIMARY KEY,
LargeStringColumn1 CHAR(100),
LargeStringColumn2 CHAR(100)
)

INSERT INTO @LargeTable (LargeStringColumn1, LargeStringColumn2)
SELECT TOP (100000) 'Table Variable Test','T-SQL Tuesday!'
FROM master.sys.columns a CROSS JOIN master.sys.columns b

WAITFOR DELAY '00:01:00' -- so that the table var doesn't go out of scope and get deallocated too quickly.</pre>
<p>This is not a massively large table. 100000 rows at 204 bytes per row (excluding header). A query of sys.dm_db_index_physical_stats (which does work on temp tables and table variables) reveals a total page count of 2632. That&#8217;s a grand total of 20.6 MB. 20 Megabytes. The SQL instance I&#8217;m running this on is allowed to use up to 2 GB of memory. No way on earth is this table variable going to cause any form of memory pressure (and I promise there is nothing else running)</p>
<p>So, run that code and, while that waitfor is running, do something that should never be done to a SQL server that you care anything about.<span id="more-687"></span></p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/10/DontTryThisAtHome.png"><img style="display: inline; border-width: 0px;" title="Dont Try This At Home" src="http://sqlinthewild.co.za/wp-content/uploads/2010/10/DontTryThisAtHome_thumb.png" border="0" alt="Dont Try This At Home" width="244" height="106" /></a></p>
<p>That&#8217;s going to kill SQL so fast that it&#8217;s not going to have a chance to clean up or deallocate anything on the way out. Just how I want it.</p>
<p>Now load up my favourite hex editor and open the TempDB data file and see if any rows from the table variable are there.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TableVariableOnDisk.png"><img style="display: inline; border-width: 0px;" title="TableVariableOnDisk" src="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TableVariableOnDisk_thumb.png" border="0" alt="TableVariableOnDisk" width="488" height="167" /></a></p>
<p>That pretty much speaks for itself. This myth, clearly false.</p>
<h3>Table Variables cannot be indexed</h3>
<p>Not too common, but I have seen this one floating around. It typically goes something like this:</p>
<blockquote><p>Table variables cannot have indexes created on them. The only exception is a clustered index defined as part of the primary key.</p></blockquote>
<p>Now there&#8217;s a small grain of truth in this. Both of the following return an error</p>
<pre class="brush: sql; title: ; notranslate">DECLARE @SomeTable TABLE (
ID int,
SomeColumn VARCHAR(20)
)
ALTER TABLE @SomeTable ADD CONSTRAINT pk_SomeTable PRIMARY KEY CLUSTERED (id)</pre>
<pre class="brush: sql; title: ; notranslate">DECLARE @SomeTable TABLE (
ID int,
SomeColumn VARCHAR(20)
)
CREATE INDEX idx_Testing ON @SomeTable (SomeColumn)</pre>
<blockquote><p><span style="color: #ff0000;">Msg 102, Level 15, State 1, Line 5<br />
Incorrect syntax near &#8216;@SomeTable&#8217;.</span></p></blockquote>
<p>Ok, so it&#8217;s not possible to run a CREATE INDEX or ALTER TABLE statement against a table variable, but does that mean that it&#8217;s limited to a single clustered index (defined as part of the primary key?)</p>
<p>It does not.</p>
<p>Firstly, there&#8217;s no requirement that the primary key be enforced by a clustered index. The following is perfectly valid.</p>
<pre class="brush: sql; title: ; notranslate">DECLARE @Test TABLE (
ID INT NOT NULL PRIMARY KEY NONCLUSTERED,
SomeCol VARCHAR(20)
)</pre>
<p>A query against TempDB&#8217;s system tables with that table declared clearly shows two entries in sys.indexes for that table variable, index id 0 (the heap) and a single non-clustered index with an auto-generated name indicating that it is enforcing the primary key</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TableVarPK.png"><img style="display: inline; border-width: 0px;" title="TableVarPK" src="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TableVarPK_thumb.png" border="0" alt="TableVarPK" width="338" height="177" /></a></p>
<p>So does that mean that we can have one and only one index on a table variable?</p>
<p>Again, no.</p>
<p>We&#8217;re limited to creating any desired indexes as part of the table&#8217;s definition, but there are two constructs that can be defined that way. Primary key and unique constraints. We can define as many unique constraints as desired on a table variable (up to the limit of number of indexes on tables). If the columns that need to be indexed aren&#8217;t unique themselves, we can always add the primary key column(s) to the unique constraint so that the combination is always unique.</p>
<pre class="brush: sql; title: ; notranslate">DECLARE @Test TABLE (
ID INT NOT NULL PRIMARY KEY,
IndexableColumn1 INT,
IndexableColumn2 DATETIME,
IndexableColumn3 VARCHAR(10),
UNIQUE (IndexableColumn1,ID),
UNIQUE (IndexableColumn2,ID),
UNIQUE (IndexableColumn3, IndexableColumn2, ID)
)

INSERT INTO @Test (ID, IndexableColumn1, IndexableColumn2, IndexableColumn3)
VALUES
(1,0,GETDATE(),'abc'),
(2,0,'2010/05/25','zzz'),
(3,1,GETDATE(),'zzz')

SELECT t.name, i.name, i.index_id
FROM tempdb.sys.tables t
INNER JOIN tempdb.sys.indexes i ON t.object_id = i.object_id</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TableVarIndexes.png"><img style="display: inline; border-width: 0px;" title="TableVarIndexes" src="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TableVarIndexes_thumb.png" border="0" alt="TableVarIndexes" width="315" height="114" /></a></p>
<p>If the primary key is enforced by the clustered index, this does not make the index any wider than it would be were it defined as a non-unique index with Create Index, as a non-unique non-clustered index always gets the clustering key added to the key columns.</p>
<p>I think that&#8217;s this myth suitably busted.</p>
<h3>Changes to Table Variables are not logged</h3>
<p>A fairly uncommon myth, but I have seen this a time or two, so I thought I&#8217;d tackle it as my third.</p>
<blockquote><p>Table variables don&#8217;t participate in transactions, hence nothing is written to the transaction log when changes are made to them.</p></blockquote>
<p>This again has two parts to it</p>
<ol>
<li>Table variables don&#8217;t participate in transactions</li>
<li>Operations on table variables are not logged</li>
</ol>
<p>The first part is completely true. Table variables do not participate in user transactions and they are not affected by an explicit rollback. Easily demonstrated.</p>
<pre class="brush: sql; title: ; notranslate">DECLARE @TransactionTest TABLE (
ID INT IDENTITY PRIMARY KEY,
SomeCol VARCHAR(20)
)

INSERT INTO @TransactionTest (SomeCol) VALUES ('Row1')
INSERT INTO @TransactionTest (SomeCol) VALUES ('Row2')

BEGIN TRANSACTION
INSERT INTO @TransactionTest (SomeCol) VALUES ('Row3')
ROLLBACK TRANSACTION

SELECT * FROM @TransactionTest</pre>
<p>That final select returns 3 rows, not the two that might be expected. The rollback did not affect the table variable.</p>
<p>So does that lack of participation imply that there is no logging? Well, no. My university logic text would call this a Non sequitur fallacy (conclusion does not follow from its premises). The fact that explicit rollbacks don&#8217;t affect table variables in no way implies that there&#8217;s no logging happening. Let&#8217;s have a look into the transaction log to prove it.</p>
<pre class="brush: sql; title: ; notranslate">USE tempdb -- make sure that the correct database is in use
GO
CHECKPOINT –- To truncate the log and indicate the start of the test

DECLARE @TransactionTest TABLE (
ID INT,
SomeCol VARCHAR(20)
)

SELECT name AS TableVariableActualName FROM tempdb.sys.tables

INSERT INTO @TransactionTest (ID, SomeCol)
VALUES
(0,'Row1'),
(1,'Row2'),
(2,'Row3')

SELECT Operation, AllocUnitName, [Begin Time], [End Time] FROM fn_dblog(NULL, NULL)
GO</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TableVariableLogging.png"><img style="display: inline; border-width: 0px;" title="TableVariableLogging" src="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TableVariableLogging_thumb.png" border="0" alt="TableVariableLogging" width="474" height="333" /></a></p>
<p>The alloc unit name matches the table variable&#8217;s name as defined in the system tables, the times for the begin and end transaction matched. I don&#8217;t think there&#8217;s any arguing that the changes to the table variable are logged.</p>
<p>The next interesting question is whether there&#8217;s more or less logging than for a temp table, more or less logging than for a permanent table. Only one way to find out.</p>
<p>I&#8217;m going to run exactly the same code with the table variable replaced by a temp table (same structure) and then I&#8217;m going to create a new user database and run exactly the same code just using a permanent table.</p>
<p>First the temp table.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TempTableLogging.png"><img style="display: inline; border-width: 0px;" title="TempTableLogging" src="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TempTableLogging_thumb.png" border="0" alt="TempTableLogging" width="474" height="290" /></a></p>
<p>And now the permanent table in a user database</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TableLogging.png"><img style="display: inline; border-width: 0px;" title="TableLogging" src="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TableLogging_thumb.png" border="0" alt="TableLogging" width="474" height="317" /></a></p>
<p>From that it appears that the table variable logs less than the temp table which logs less than the user table, however the table variable does still has some logging done.</p>
<p>&#8216;But why?&#8217; I hear people asking. After all, TempDB <a href="http://support.microsoft.com/kb/307487">doesn&#8217;t log redo information</a> and, since table variables don&#8217;t participate in transactions there&#8217;s no need to log undo information. So why log at all?</p>
<p>Because an explicit rollback (ROLLBACK TRANSACTION) is not the only time that changes to a table will have to be undone. Consider this one.</p>
<pre class="brush: sql; title: ; notranslate">DECLARE @TransactionTest TABLE (
ID INT PRIMARY KEY,
SomeCol VARCHAR(20)
)

INSERT INTO @TransactionTest (ID, SomeCol)
VALUES
(0,'Row1'),
(1,'Row2'),
(1,'Row3')</pre>
<p>That third row will fail with a primary key violation. If the table variable didn&#8217;t log at all, SQL would have no way of undoing the inserts of the first two rows when the third one fails. That&#8217;s not permitted, an insert is an atomic operation, it cannot partially succeed. Hence changes to a table variable must be logged sufficiently to allow SQL to generate the undo operations in cases like this. A glance over the transaction log shows in detail what happened</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TableVariableRollback.png"><img style="display: inline; border-width: 0px;" title="TableVariableRollback" src="http://sqlinthewild.co.za/wp-content/uploads/2010/10/TableVariableRollback_thumb.png" border="0" alt="TableVariableRollback" width="474" height="245" /></a></p>
<p>Two rows inserted, followed by two rows deleted, as SQL generated operations to undo the insert statement, followed by an abort transaction.</p>
<p>I think that&#8217;s enough on this. As for the myth that changes to table variables aren&#8217;t logged, I believe that&#8217;s sufficiently disproven by this point.</p>
<h3>In Conclusion</h3>
<p>Table Variables are memory-only: False</p>
<p>Table Variables cannot be indexed: False</p>
<p>Changes to Table Variables are not logged: False</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/10/12/a-trio-of-table-variables/feed/</wfw:commentRss>
		<slash:comments>27</slash:comments>
		</item>
		<item>
		<title>In, Exists and join &#8211; a roundup</title>
		<link>http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/#comments</comments>
		<pubDate>Tue, 27 Apr 2010 16:30:47 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=586</guid>
		<description><![CDATA[Over the last several months I&#8217;ve had a look at IN, Exists, Join and their opposites to see how they perform and whether there&#8217;s any truth in the advice that is often seen on forums and blogs advocating replacing one with the other. Previous parts of this series can be found: Exists vs In In [...]]]></description>
			<content:encoded><![CDATA[<p>Over the last several months I&#8217;ve had a look at IN, Exists, Join and their opposites to see how they perform and whether there&#8217;s any truth in the advice that is often seen on forums and blogs advocating replacing one with the other.</p>
<p>Previous parts of this series can be found:</p>
<ul>
<li><a href="http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/">Exists vs In</a></li>
<li><a href="http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/">In vs Inner Join</a></li>
<li><a href="http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/">Not Exists vs Not In</a></li>
<li><a href="http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/">Not Exists vs Left Outer Join … Is Null</a></li>
</ul>
<p>In this roundup post, I&#8217;m going to do multiple tests on the 6 query forms, with different numbers of rows, indexes, no indexes and, for the negative forms (NOT IN, NOT EXISTS), nullable and non-nullable join columns.</p>
<p>In the individual tests, I used 250000 rows in the first table and around 3000 rows in the secondary table. In this roundup, I&#8217;m going to use 3 different row counts, 1000000 rows, 50000 rows and 2500 rows. That should give a reasonable idea for performance at various table sizes. (Not much point in going smaller than 2500 rows. Everything&#8217;s fast on 100 rows)</p>
<p>Some notes on the tests.</p>
<ul>
<li>The version of SQL is SQL Server 2008 SP1 x64 Developer Edition.</li>
<li>The tests were run on a laptop. Core-2 Duo, 3 GB memory. SQL limited to 1 processor, so no parallelism possible.</li>
<li>Each query will be run 10 times, reads, cpu and duration measured by profiler and averaged.</li>
<li>Each query will be run once before the tests start to ensure that the data is in cache and the execution plans are generated and cached.</li>
<li>Reproduction scripts will be available for download.</li>
</ul>
<h3><span id="more-586"></span>Exists vs. In vs. Inner Join</h3>
<p>First, no indexes on the join columns</p>
<table border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top"><strong>Table Size</strong></td>
<td valign="top"><strong>Operator</strong></td>
<td valign="top"><strong>CPU</strong></td>
<td valign="top"><strong>Reads </strong></td>
<td valign="top"><strong>Duration</strong></td>
</tr>
<tr>
<td rowspan="3">Large</td>
<td valign="top">IN</td>
<td valign="top">1293</td>
<td valign="top">14585</td>
<td valign="top">9649</td>
</tr>
<tr>
<td valign="top">Exists</td>
<td valign="top">1260</td>
<td valign="top">14585</td>
<td valign="top">9573</td>
</tr>
<tr>
<td valign="top">Inner Join</td>
<td valign="top">1302</td>
<td valign="top">14585</td>
<td valign="top">9716</td>
</tr>
<tr>
<td rowspan="3">Medium</td>
<td valign="top">IN</td>
<td valign="top">59</td>
<td valign="top">747</td>
<td valign="top">538</td>
</tr>
<tr>
<td valign="top">Exists</td>
<td valign="top">78</td>
<td valign="top">747</td>
<td valign="top">574</td>
</tr>
<tr>
<td valign="top">Inner Join</td>
<td valign="top">69</td>
<td valign="top">747</td>
<td valign="top">523</td>
</tr>
<tr>
<td rowspan="3">Small</td>
<td valign="top">IN</td>
<td valign="top">7</td>
<td valign="top">41</td>
<td valign="top">65</td>
</tr>
<tr>
<td valign="top">Exists</td>
<td valign="top">3</td>
<td valign="top">41</td>
<td valign="top">91</td>
</tr>
<tr>
<td valign="top">Inner Join</td>
<td valign="top">4</td>
<td valign="top">41</td>
<td valign="top">65</td>
</tr>
</tbody>
</table>
<p>Now with indexes on the join columns</p>
<table border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top"><strong>Table Size</strong></td>
<td valign="top"><strong>Operator</strong></td>
<td valign="top"><strong>CPU</strong></td>
<td valign="top"><strong>Reads </strong></td>
<td valign="top"><strong>Duration </strong></td>
</tr>
<tr>
<td rowspan="3">Large</td>
<td valign="top">IN</td>
<td valign="top">973</td>
<td valign="top">1760</td>
<td valign="top">9707</td>
</tr>
<tr>
<td valign="top">Exists</td>
<td valign="top">956</td>
<td valign="top">1760</td>
<td valign="top">9483</td>
</tr>
<tr>
<td valign="top">Inner Join</td>
<td valign="top">1173</td>
<td valign="top">1760</td>
<td valign="top">9539</td>
</tr>
<tr>
<td rowspan="3">Medium</td>
<td valign="top">IN</td>
<td valign="top">43</td>
<td valign="top">100</td>
<td valign="top">516</td>
</tr>
<tr>
<td valign="top">Exists</td>
<td valign="top">53</td>
<td valign="top">100</td>
<td valign="top">548</td>
</tr>
<tr>
<td valign="top">Inner Join</td>
<td valign="top">59</td>
<td valign="top">100</td>
<td valign="top">498</td>
</tr>
<tr>
<td rowspan="3">Small</td>
<td valign="top">IN</td>
<td valign="top">3</td>
<td valign="top">9</td>
<td valign="top">64</td>
</tr>
<tr>
<td valign="top">Exists</td>
<td valign="top">1</td>
<td valign="top">9</td>
<td valign="top">80</td>
</tr>
<tr>
<td valign="top">Inner Join</td>
<td valign="top">4</td>
<td valign="top">9</td>
<td valign="top">67</td>
</tr>
</tbody>
</table>
<h3>Not Exists vs. Not In vs. Left Outer Join &#8230; Is Null</h3>
<p>First test with the columns join columns nullable, no indexes</p>
<table border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top"><strong>Table Size</strong></td>
<td valign="top"><strong>Operator</strong></td>
<td valign="top"><strong>CPU</strong></td>
<td valign="top"><strong>Reads </strong></td>
<td valign="top"><strong>Duration </strong></td>
</tr>
<tr>
<td rowspan="3">Large</td>
<td valign="top">NOT IN</td>
<td valign="top">3194</td>
<td valign="top">2014622</td>
<td valign="top">3251</td>
</tr>
<tr>
<td valign="top">NOT Exists</td>
<td valign="top">820</td>
<td valign="top">14585</td>
<td valign="top">837</td>
</tr>
<tr>
<td valign="top">Outer Join</td>
<td valign="top">962</td>
<td valign="top">14585</td>
<td valign="top">1025</td>
</tr>
<tr>
<td rowspan="3">Medium</td>
<td valign="top">NOT IN</td>
<td valign="top">174</td>
<td valign="top">100765</td>
<td valign="top">217</td>
</tr>
<tr>
<td valign="top">NOT Exists</td>
<td valign="top">54</td>
<td valign="top">747</td>
<td valign="top">121</td>
</tr>
<tr>
<td valign="top">Outer Join</td>
<td valign="top">53</td>
<td valign="top">747</td>
<td valign="top">79</td>
</tr>
<tr>
<td rowspan="3">Small</td>
<td valign="top">NOT IN</td>
<td valign="top">12</td>
<td valign="top">5043</td>
<td valign="top">13</td>
</tr>
<tr>
<td valign="top">NOT Exists</td>
<td valign="top">4</td>
<td valign="top">41</td>
<td valign="top">6</td>
</tr>
<tr>
<td valign="top">Outer Join</td>
<td valign="top">3</td>
<td valign="top">41</td>
<td valign="top">5</td>
</tr>
</tbody>
</table>
<p>Then with join columns nullable with indexes</p>
<table border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top"><strong>Table Size</strong></td>
<td valign="top"><strong>Operator</strong></td>
<td valign="top"><strong>CPU</strong></td>
<td valign="top"><strong>Reads </strong></td>
<td valign="top"><strong>Duration </strong></td>
</tr>
<tr>
<td rowspan="3">Large</td>
<td valign="top">NOT IN</td>
<td valign="top">2677</td>
<td valign="top">2001762</td>
<td valign="top">2726</td>
</tr>
<tr>
<td valign="top">NOT Exists</td>
<td valign="top">569</td>
<td valign="top">1760</td>
<td valign="top">586</td>
</tr>
<tr>
<td valign="top">Outer Join</td>
<td valign="top">949</td>
<td valign="top">1760</td>
<td valign="top">1029</td>
</tr>
<tr>
<td rowspan="3">Medium</td>
<td valign="top">NOT IN</td>
<td valign="top">137</td>
<td valign="top">100102</td>
<td valign="top">164</td>
</tr>
<tr>
<td valign="top">NOT Exists</td>
<td valign="top">40</td>
<td valign="top">100</td>
<td valign="top">104</td>
</tr>
<tr>
<td valign="top">Outer Join</td>
<td valign="top">48</td>
<td valign="top">100</td>
<td valign="top">69</td>
</tr>
<tr>
<td rowspan="3">Small</td>
<td valign="top">NOT IN</td>
<td valign="top">11</td>
<td valign="top">5011</td>
<td valign="top">12</td>
</tr>
<tr>
<td valign="top">NOT Exists</td>
<td valign="top">3</td>
<td valign="top">9</td>
<td valign="top">4</td>
</tr>
<tr>
<td valign="top">Outer Join</td>
<td valign="top">6</td>
<td valign="top">9</td>
<td valign="top">6</td>
</tr>
</tbody>
</table>
<p>Now, let&#8217;s make the join columns not nullable. Again, no indexes to start with.</p>
<table border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top"><strong>Table Size</strong></td>
<td valign="top"><strong>Operator</strong></td>
<td valign="top"><strong>CPU</strong></td>
<td valign="top"><strong>Reads </strong></td>
<td valign="top"><strong>Duration </strong></td>
</tr>
<tr>
<td rowspan="3">Large</td>
<td valign="top">NOT IN</td>
<td valign="top">741</td>
<td valign="top">14585</td>
<td valign="top">753</td>
</tr>
<tr>
<td valign="top">NOT Exists</td>
<td valign="top">784</td>
<td valign="top">14585</td>
<td valign="top">790</td>
</tr>
<tr>
<td valign="top">Outer Join</td>
<td valign="top">884</td>
<td valign="top">14585</td>
<td valign="top">937</td>
</tr>
<tr>
<td rowspan="3">Medium</td>
<td valign="top">NOT IN</td>
<td valign="top">43</td>
<td valign="top">747</td>
<td valign="top">103</td>
</tr>
<tr>
<td valign="top">NOT Exists</td>
<td valign="top">49</td>
<td valign="top">747</td>
<td valign="top">120</td>
</tr>
<tr>
<td valign="top">Outer Join</td>
<td valign="top">53</td>
<td valign="top">747</td>
<td valign="top">74</td>
</tr>
<tr>
<td rowspan="3">Small</td>
<td valign="top">NOT IN</td>
<td valign="top">4</td>
<td valign="top">41</td>
<td valign="top">4</td>
</tr>
<tr>
<td valign="top">NOT Exists</td>
<td valign="top">1</td>
<td valign="top">41</td>
<td valign="top">5</td>
</tr>
<tr>
<td valign="top">Outer Join</td>
<td valign="top">1</td>
<td valign="top">41</td>
<td valign="top">5</td>
</tr>
</tbody>
</table>
<p>and finally, join columns not nullable, with indexes</p>
<table border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top"><strong>Table Size</strong></td>
<td valign="top"><strong>Operator</strong></td>
<td valign="top"><strong>CPU</strong></td>
<td valign="top"><strong>Reads </strong></td>
<td valign="top"><strong>Duration </strong></td>
</tr>
<tr>
<td rowspan="3">Large</td>
<td valign="top">NOT IN</td>
<td valign="top">578</td>
<td valign="top">1382</td>
<td valign="top">588</td>
</tr>
<tr>
<td valign="top">NOT Exists</td>
<td valign="top">585</td>
<td valign="top">1382</td>
<td valign="top">597</td>
</tr>
<tr>
<td valign="top">Outer Join</td>
<td valign="top">953</td>
<td valign="top">1382</td>
<td valign="top">1006</td>
</tr>
<tr>
<td rowspan="3">Medium</td>
<td valign="top">NOT IN</td>
<td valign="top">37</td>
<td valign="top">80</td>
<td valign="top">79</td>
</tr>
<tr>
<td valign="top">NOT Exists</td>
<td valign="top">34</td>
<td valign="top">80</td>
<td valign="top">79</td>
</tr>
<tr>
<td valign="top">Outer Join</td>
<td valign="top">39</td>
<td valign="top">80</td>
<td valign="top">84</td>
</tr>
<tr>
<td rowspan="3">Small</td>
<td valign="top">NOT IN</td>
<td valign="top">3</td>
<td valign="top">8</td>
<td valign="top">4</td>
</tr>
<tr>
<td valign="top">NOT Exists</td>
<td valign="top">1</td>
<td valign="top">8</td>
<td valign="top">5</td>
</tr>
<tr>
<td valign="top">Outer Join</td>
<td valign="top">4</td>
<td valign="top">8</td>
<td valign="top">5</td>
</tr>
</tbody>
</table>
<p>These results seem to pretty much confirm the earlier conclusions.</p>
<p>Exists and IN perform much the same, whether there are indexes on the join column or not. When there are indexes on the join columns, the INNER JOIN is slightly (very slightly) slower, which is more noticeable on the large tables, much less on the medium or small ones. (Note I&#8217;m mostly looking at CPU time, as the duration is also affected by sending of results to client, in this case, lots and lots of results)</p>
<p>When it comes to NOT In and NOT Exists they perform much the same when the columns involved are not nullable. If the columns are nullable, Not In is significantly slower because it has a different behaviour when nulls are present.</p>
<p>The join is slightly slower than Not Exists (or Not In on non-nullable columns), again only noticeable on the large table, probably because the optimiser has to do a full join with a secondary filter rather than the anti-semi join that it can use for Not Exists and Not In.</p>
<p>My conclusion from earlier posts stands. If all you are doing is looking for matching or non-matching rows and you don&#8217;t need any columns from the second table, use IN or Exists (or their negations), as appropriate for the situation. Only when you need columns from the second table should Join be used.</p>
<p>I think (and hope) that this adequately concludes the discussion on the Exists and In and joins, both behaviour and performance.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/04/Reproduction-scripts.zip">Reproduction scripts</a></p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/04/27/in-exists-and-join-a-roundup/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Left outer join vs NOT EXISTS</title>
		<link>http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/#comments</comments>
		<pubDate>Tue, 23 Mar 2010 14:00:58 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=575</guid>
		<description><![CDATA[And to wrap up the miniseries on IN, EXISTS and JOIN, a look at NOT EXISTS and LEFT OUTER JOIN for finding non-matching rows. For previous parts, see In vs Exists In vs Inner Join Not in vs Not Exists I&#8217;m looking at NOT EXISTS and LEFT OUTER JOIN, as opposed to NOT IN and [...]]]></description>
			<content:encoded><![CDATA[<p>And to wrap up the miniseries on IN, EXISTS and JOIN, a look at NOT EXISTS and LEFT OUTER JOIN for finding non-matching rows.</p>
<p>For previous parts, see</p>
<ul>
<li><a href="http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/">In vs Exists</a></li>
<li><a href="http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/">In vs Inner Join</a></li>
<li><a href="http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/">Not in vs Not Exists</a></li>
</ul>
<p>I&#8217;m looking at NOT EXISTS and LEFT OUTER JOIN, as opposed to NOT IN and LEFT OUTER JOIN, because, as shown in the <a href="http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/">previous part</a> of this series, NOT IN behaves badly in the presence of NULLs. Specifically, if there are any NULLs in the result set, NOT IN returns 0 matches.</p>
<p>The LEFT OUTER JOIN, like the NOT EXISTS can handle NULLs in the second result set without automatically returning no matches. It behaves the same regardless of whether the join columns are nullable or not. Seeing as NULL does not equal anything, any rows in the second result set that have NULL for the join column are eliminated by the join and have no further effect on the query.</p>
<p>It is important, when using the LEFT OUTER JOIN … IS NULL, to carefully pick the column used for the IS NULL check. It should either be a non-nullable column (the primary key is a somewhat classical choice) or the join column (as nulls in that will be eliminated by the join)</p>
<p>Onto the tests</p>
<p>The usual test tables…</p>
<pre class="brush: sql; title: ; notranslate">
CREATE TABLE BigTable (
id INT IDENTITY PRIMARY KEY,
SomeColumn char(4) NOT NULL,
Filler CHAR(100)
)

CREATE TABLE SmallerTable (
id INT IDENTITY PRIMARY KEY,
LookupColumn char(4) NOT NULL,
SomeArbDate Datetime default getdate()
)

INSERT INTO BigTable (SomeColumn)
SELECT top 250000
char(65+FLOOR(RAND(a.column_id *5645 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *3784 + b.object_id)*12)) +
char(65+FLOOR(RAND(b.column_id *6841 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *7544 + b.object_id)*8))
from master.sys.columns a cross join master.sys.columns b

INSERT INTO SmallerTable (LookupColumn)
SELECT DISTINCT SomeColumn
FROM BigTable TABLESAMPLE (25 PERCENT)
-- (3918 row(s) affected)
</pre>
<p>First without indexes</p>
<pre class="brush: sql; title: ; notranslate">-- Query 1
SELECT BigTable.ID, SomeColumn
	FROM BigTable LEFT OUTER JOIN SmallerTable ON BigTable.SomeColumn = SmallerTable.LookupColumn
	WHERE LookupColumn IS NULL

-- Query 2
SELECT ID, SomeColumn FROM BigTable
WHERE NOT EXISTS (SELECT LookupColumn FROM SmallerTable WHERE SmallerTable.LookupColumn = BigTable.SomeColumn)
</pre>
<p>Let&#8217;s take a look at the execution plans</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/03/LeftOuterJoinNotIN_NotIndexed.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="LeftOuterJoinNotIN_NotIndexed" src="http://sqlinthewild.co.za/wp-content/uploads/2010/03/LeftOuterJoinNotIN_NotIndexed_thumb.png" border="0" alt="LeftOuterJoinNotIN_NotIndexed" width="244" height="143" /></a></p>
<p><span id="more-575"></span></p>
<p>The plans are almost the same. There&#8217;s an extra filter in the JOIN and the logical join types are different. Why the different joins?</p>
<p>If we look at the execution plan for the NOT EXISTS, the join type is Right Anti-Semi join (a bit of a mouthful). This is a special join type used by the NOT EXISTS and NOT IN and it&#8217;s the opposite of the semi-join that I discussed back when I looked at <a href="http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/">IN and INNER JOIN</a></p>
<p>An anti-semi join is a partial join. It does not actually join rows in from the second table, it simply checks for, in this case, the absence of matches. That&#8217;s why it&#8217;s an <strong>anti</strong>-semi join. A semi-join checks for matches, an anti-semi join does the opposite and checks for the absence of matches.</p>
<p>The extra filter in the LEFT OUTER JOIN query is because the join in that execution plan is a complete right join, i.e. it&#8217;s returned matching rows (and possibly duplicates) from the second table. The filter operator is doing the IS NULL filter.</p>
<p>That&#8217;s the major difference between these two. When using the LEFT OUTER JOIN … IS NULL technique, SQL can&#8217;t tell that you&#8217;re only doing a check for nonexistance. Optimiser&#8217;s not smart enough (yet). Hence it does the complete join and then filters. The NOT EXISTS filters as part of the join.</p>
<p>Technical discussion done, now how did they actually perform?</p>
<blockquote><p>&#8211; Query 1: LEFT OUTER JOIN<br />
Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 15, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 157 ms,  elapsed time = 486 ms.</p>
<p>&#8211; Query 2: NOT EXISTS<br />
Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 15, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 156 ms,  elapsed time = 358 ms.</p></blockquote>
<p>Can&#8217;t make a big deal out of that.</p>
<p>Now, index on the join columns</p>
<pre class="brush: sql; title: ; notranslate">CREATE INDEX idx_BigTable_SomeColumn
ON BigTable (SomeColumn)

CREATE INDEX idx_SmallerTable_LookupColumn
ON SmallerTable (LookupColumn)</pre>
<p>and the same queries</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/03/LeftOuterJoinNotIN_Indexed.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="LeftOuterJoinNotIN_Indexed" src="http://sqlinthewild.co.za/wp-content/uploads/2010/03/LeftOuterJoinNotIN_Indexed_thumb.png" border="0" alt="LeftOuterJoinNotIN_Indexed" width="244" height="130" /></a></p>
<p>With indexes added, the execution plans are even more different. The LEFT OUTER JOIN is still doing the complete outer join with a filter afterwards. It&#8217;s interesting to note that it&#8217;s still a hash join, even though both inputs are sorted in the order of the join keys.</p>
<p>The Not Exists now has a stream aggregate (because duplicate values are irrelevant for an EXISTS/NOT EXISTS) and an anti-semi join. The join here is no longer hash, it&#8217;s now a merge join.</p>
<p>This echoes what I found when looking at <a href="http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/">IN vs Inner join</a>. When the columns were indexed, the inner join still went for a hash join but the IN changed to a merge join. At the time, I thought it to be a fluke, I&#8217;m not so sure any longer. More tests on this are required…</p>
<p>The costing of the plans indicates that the optimiser believes that the LEFT OUTER JOIN form is more expensive. Do the execution stats carry the same conclusion?</p>
<blockquote><p>&#8211; Query 1: LEFT OUTER JOIN<br />
Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 342, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 8, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 172 ms,  elapsed time = 686 ms.</p>
<p>&#8211; Query 2: NOT EXISTS<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 342, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 8, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 78 ms,  elapsed time = 388 ms.</p></blockquote>
<p>Well, yes, they do.</p>
<p>The reads (ignoring the existence of the worktable for the hash join) are the same. That&#8217;s to be expected, both queries executed with a single scan of each index.</p>
<p>The CPU time figures are not. The CPU time of the LEFT OUTER JOIN form is almost twice that of the NOT EXISTS.</p>
<h3>In conclusion…</h3>
<p>If you need to find rows that don&#8217;t have a match in a second table, and the columns are nullable, use NOT EXISTS. If you need to find rows that don&#8217;t have a match in a second table, and the columns are not nullable, use NOT EXISTS or NOT IN.</p>
<p>The LEFT OUTER JOIN … IS NULL method is slower when the columns are indexed and it&#8217;s perhaps not as clear what&#8217;s happening. It&#8217;s reasonably clear what a NOT EXISTS predicate does, with LEFT OUTER JOIN it&#8217;s not immediately clear that it&#8217;s a check for non-matching rows, especially if there are several where clause predicates.</p>
<p>I think that&#8217;s about that for this series. I&#8217;m going to do one more post summarising all the findings, probably in a week or two.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>NOT EXISTS vs NOT IN</title>
		<link>http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/#comments</comments>
		<pubDate>Thu, 18 Feb 2010 14:00:32 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=553</guid>
		<description><![CDATA[Continuing with the mini-series on query operators, I want to have a look at NOT EXISTS and NOT IN. Previous parts of this miniseries are: EXISTS vs IN IN vs INNER JOIN Just one note before diving into that. The examples I’m using are fairly simplistic and that’s intentional. I’m trying to find what, if [...]]]></description>
			<content:encoded><![CDATA[<p>Continuing with the mini-series on query operators, I want to have a look at NOT EXISTS and NOT IN.</p>
<p>Previous parts of this miniseries are:</p>
<ul>
<li><a href="http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/">EXISTS vs IN</a></li>
<li><a href="http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/">IN vs INNER JOIN</a></li>
</ul>
<p>Just one note before diving into that. The examples I’m using are fairly simplistic and that’s intentional. I’m trying to find what, if any, are the performance differences in a benchmark-style setup. I’ll have some comments on more complex examples in a later post.</p>
<p>The most important thing to note about NOT EXISTS and NOT IN is that, unlike EXISTS and IN,  they are not equivalent in all cases. Specifically, when NULLs are involved they will return different results. To be totally specific, when the subquery returns even one null, NOT IN will not match any rows.</p>
<p>The reason for this can be found by looking at the details of what the NOT IN operation actually means.</p>
<p>Let’s say, for illustration purposes that there are 4 rows in the table called t, there’s a column called ID with values 1..4</p>
<pre class="brush: sql; title: ; notranslate">WHERE SomeValue NOT IN (SELECT AVal FROM t)
is equivalent to
WHERE (
SomeValue != (SELECT AVal FROM t WHERE ID=1)
AND
SomeValue != (SELECT AVal FROM t WHERE ID=2)
AND
SomeValue != (SELECT AVal FROM t WHERE ID=3)
AND
SomeValue != (SELECT AVal FROM t WHERE ID=4)
)</pre>
<p>Let’s further say that AVal is NULL where ID = 4. Hence that != comparison returns UNKNOWN. The logical truth table for AND states that UNKNOWN and TRUE is UNKNOWN, UNKNOWN and FALSE is FALSE. There is no value that can be AND’d with UNKNOWN to produce the result TRUE</p>
<p>Hence, if any row of that subquery returns NULL, the entire NOT IN operator will evaluate to either FALSE or NULL and no records will be returned</p>
<p>So what about EXISTS?</p>
<p><span id="more-553"></span></p>
<p>Exists cannot return NULL. It’s checking solely for the presence or absence of a row in the subquery and, hence, it can only return true or false. Since it cannot return NULL, there’s no possibility of a single NULL resulting in the entire expression evaluating to UNKNOWN.</p>
<p>Hence, when the column in the subquery that’s used for comparison with the outer table can have nulls in it, consider carefully which of Not Exists or Not in you want to use.</p>
<p>Ok, but say there are no nulls in the column. How do they compare speed-wise. I’m going to do two tests, one where the columns involved in the comparison are defined as NULL and one where they are defined as NOT NULL. There will be no null values in the columns in either case. In both cases, the join columns will be indexed. After all, we all index our join columns, right?</p>
<p>So, first test, non-nullable columns. First some setup</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE BigTable (
id INT IDENTITY PRIMARY KEY,
SomeColumn char(4) NOT NULL,
Filler CHAR(100)
)

CREATE TABLE SmallerTable (
id INT IDENTITY PRIMARY KEY,
LookupColumn char(4) NOT NULL,
SomeArbDate Datetime default getdate()
)

INSERT INTO BigTable (SomeColumn)
SELECT top 250000
char(65+FLOOR(RAND(a.column_id *5645 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *3784 + b.object_id)*12)) +
char(65+FLOOR(RAND(b.column_id *6841 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *7544 + b.object_id)*8))
from master.sys.columns a cross join master.sys.columns b

INSERT INTO SmallerTable (LookupColumn)
SELECT DISTINCT SomeColumn
FROM BigTable TABLESAMPLE (25 PERCENT)
-- (3898 row(s) affected)

CREATE INDEX idx_BigTable_SomeColumn
ON BigTable (SomeColumn)
CREATE INDEX idx_SmallerTable_LookupColumn
ON SmallerTable (LookupColumn)</pre>
<p>Then the queries</p>
<pre class="brush: sql; title: ; notranslate">-- Query 1
SELECT ID, SomeColumn FROM BigTable
WHERE SomeColumn NOT IN (SELECT LookupColumn FROM SmallerTable)

-- Query 2
SELECT ID, SomeColumn FROM BigTable
WHERE NOT EXISTS (SELECT LookupColumn FROM SmallerTable WHERE SmallerTable.LookupColumn = BigTable.SomeColumn)</pre>
<p>The first thing to note is that the execution plans are identical.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/02/ExecPlansNOTNULL.png"><img style="display: inline; border-width: 0px;" title="ExecPlansNOTNULL" src="http://sqlinthewild.co.za/wp-content/uploads/2010/02/ExecPlansNOTNULL_thumb.png" border="0" alt="ExecPlansNOTNULL" width="244" height="130" /></a></p>
<p>The execution characteristics are also identical.</p>
<blockquote><p><strong>Query 1<br />
</strong>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 342, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 8, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 156 ms,  elapsed time = 221 ms.</p>
<p><strong>Query 2<br />
</strong>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 342, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 8, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 156 ms,  elapsed time = 247 ms.</p></blockquote>
<p>So, at least for the case where the columns are defined as NOT NULL, these two perform the same.</p>
<p>What about the case where the columns are defined as nullable? I&#8217;m going to simply alter the two columns involved without changing anything else, then test out the two queries again.</p>
<pre class="brush: sql; title: ; notranslate">ALTER TABLE BigTable
 ALTER COLUMN SomeColumn char(4) NULL

ALTER TABLE SmallerTable
 ALTER COLUMN LookupColumn char(4) NULL</pre>
<p>And the same two queries</p>
<pre class="brush: sql; title: ; notranslate">-- Query 1
&lt;pre&gt;SELECT ID, SomeColumn FROM BigTable
WHERE SomeColumn NOT IN (SELECT LookupColumn FROM SmallerTable)

-- Query 2
SELECT ID, SomeColumn FROM BigTable
WHERE NOT EXISTS (SELECT LookupColumn FROM SmallerTable WHERE SmallerTable.LookupColumn = BigTable.SomeColumn)</pre>
<p>And as for their performance…</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/02/ExecPlansNull.png"><img style="display: inline; border-width: 0px;" title="ExecPlansNull" src="http://sqlinthewild.co.za/wp-content/uploads/2010/02/ExecPlansNull_thumb.png" border="0" alt="ExecPlansNull" width="244" height="123" /></a></p>
<blockquote><p><strong>Query 1</strong><br />
Table &#8216;SmallerTable&#8217;. Scan count 3, logical reads 500011, physical reads 0.<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 437, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 827 ms,  elapsed time = 825 ms.</p>
<p><strong>Query 2<br />
</strong>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 437, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 9, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 156 ms,  elapsed time = 228 ms.</p></blockquote>
<p>Radically different execution plans, radically different performance characteristics. The NOT IN took over 5 times longer to execute and did thousands of times more reads.</p>
<p>Why is that complex execution plan required when there may be nulls in the column? I can&#8217;t answer that one, probably only one of the query optimiser developers can, however the results are obvious. When the columns allow nulls but has none, the NOT IN performs significantly worse than NOT EXISTS.</p>
<p>So, take-aways from this?</p>
<p>Most importantly, NOT EXISTS and NOT IN do not have the same behaviour when there are NULLs involved. Chose carefully which you want.</p>
<p>Columns that will never contain NULL values should be defined as NOT NULL so that SQL knows there will never be NULL values in them and so that it doesn’t have to produce complex plans to handle potential nulls.</p>
<p>On non-nullable columns, the behaviour and performance of NOT IN and NOT EXISTS are the same, so use whichever one works better for the specific situation.</p>
<p>One more to go on this: <a href="http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/">LEFT OUTER JOIN with the IS NULL check vs NOT IN</a></p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>IN vs INNER JOIN</title>
		<link>http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 14:00:58 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=394</guid>
		<description><![CDATA[Often in forum threads discussing query performance I&#8217;ll see people recommending replacing an INNER JOIN with an IN or EXISTS (or recommending replacing an IN or EXISTS with an INNER JOIN) for performance reasons. I&#8217;ve previously looked at how the IN and EXISTS compared, now I&#8217;m going to investigate and see how IN compares with [...]]]></description>
			<content:encoded><![CDATA[<p>Often in forum threads discussing query performance I&#8217;ll see people recommending replacing an INNER JOIN with an IN or EXISTS (or recommending replacing an IN or EXISTS with an INNER JOIN) for performance reasons. I&#8217;ve previously looked at how the <a href="http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/">IN and EXISTS</a> compared, now I&#8217;m going to investigate and see how IN compares with the join.</p>
<p>One very important thing to note right off is that they are not equivalent in all cases.</p>
<p>An inner join between two tables does a complete join, it checks for matches and returns rows. This means, if there are multiple matching rows in the second table, multiple rows will be returned. Also, when two tables are joined, columns can be returned from either.  As a quick example:</p>
<pre class="brush: sql; title: ; notranslate">
DECLARE @BigTable TABLE (
 id INT IDENTITY PRIMARY KEY,
 SomeColumn CHAR(4),
 Filler CHAR(100)
)

Insert into @BigTable(SomeColumn) Values (1)
Insert into @BigTable(SomeColumn) Values (2)
Insert into @BigTable(SomeColumn) Values (3)
Insert into @BigTable(SomeColumn) Values (4)
Insert into @BigTable(SomeColumn) Values (5)

DECLARE @SomeTable TABLE (IntCol int)
Insert into @SomeTable (IntCol) Values (1)
Insert into @SomeTable (IntCol) Values (2)
Insert into @SomeTable (IntCol) Values (2)
Insert into @SomeTable (IntCol) Values (3)
Insert into @SomeTable (IntCol) Values (4)
Insert into @SomeTable (IntCol) Values (5)
Insert into @SomeTable (IntCol) Values (5)

SELECT *
 FROM @BigTable b INNER JOIN @SomeTable  s ON b.SomeColumn = s.IntCol</pre>
<p>This returns 7 rows and returns columns from both tables. Because the values in @SomeTable are duplicated, the matching rows from BigTable are returned twice.</p>
<p>With an IN, what is done is a semi-join, a join that checks for matches but does not return rows. This means if there are multiple matching tables in the resultset used for the IN, it doesn&#8217;t matter. Only one row from the first table will be returned. Also, because the rows are not returned, columns from the table referenced in the IN cannot be returned. As a quick example</p>
<pre class="brush: sql; title: ; notranslate">
DECLARE @BigTable TABLE (
 id INT IDENTITY PRIMARY KEY,
 SomeColumn CHAR(4),
 Filler CHAR(100)
)

Insert into @BigTable(SomeColumn) Values (1)
Insert into @BigTable(SomeColumn) Values (2)
Insert into @BigTable(SomeColumn) Values (3)
Insert into @BigTable(SomeColumn) Values (4)
Insert into @BigTable(SomeColumn) Values (5)

DECLARE @SomeTable TABLE (IntCol int)
Insert into @SomeTable (IntCol) Values (1)
Insert into @SomeTable (IntCol) Values (2)
Insert into @SomeTable (IntCol) Values (2)
Insert into @SomeTable (IntCol) Values (3)
Insert into @SomeTable (IntCol) Values (4)
Insert into @SomeTable (IntCol) Values (5)
Insert into @SomeTable (IntCol) Values (5)

SELECT *
 FROM @BigTable
 WHERE SomeColumn IN (Select IntCol FROM @SomeTable)</pre>
<p>This returns 5 rows and only columns from BigTable.</p>
<p>So, that said, how does the performance of the two differ for the cases where the results are identical (no duplicates in the second table, no columns needed from the second table)? For that, I&#8217;m going to need larger tables to play with.<span id="more-394"></span></p>
<pre class="brush: sql; title: ; notranslate">
CREATE TABLE BigTable (
id INT IDENTITY PRIMARY KEY,
SomeColumn CHAR(4),
Filler CHAR(100)
)

CREATE TABLE SmallerTable (
id INT IDENTITY PRIMARY KEY,
LookupColumn CHAR(4),
SomeArbDate DATETIME DEFAULT GETDATE()
)

INSERT INTO BigTable (SomeColumn)
SELECT top 250000 char(65+FLOOR(RAND(a.column_id *5645 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *3784 + b.object_id)*12)) + char(65+FLOOR(RAND(b.column_id *6841 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *7544 + b.object_id)*8))
FROM master.sys.columns a CROSS JOIN master.sys.columns b

INSERT INTO SmallerTable (LookupColumn)
SELECT DISTINCT SomeColumn
FROM BigTable TABLESAMPLE (25 PERCENT)
-- (3819 row(s) affected)
</pre>
<p>That&#8217;s the setup done, now for the two test cases. Let’s first try without indexes and see how the INNER JOIN and IN compare. I&#8217;m selecting from just the first table to ensure that the two queries are logically identical. The DISTINCT used to populate the smaller table ensures that there are no duplicate rows in the smaller table.</p>
<pre class="brush: sql; title: ; notranslate">SELECT BigTable.ID, SomeColumn
FROM BigTable
WHERE SomeColumn IN (SELECT LookupColumn FROM dbo.SmallerTable)

SELECT BigTable.ID, SomeColumn
FROM BigTable
INNER JOIN SmallerTable ON dbo.BigTable.SomeColumn = dbo.SmallerTable.LookupColumn</pre>
<p>Something of interest straight away, the execution plans are almost identical. Not completely identical, but the only difference is that the hash join for the IN shows a Hash Match (Right Semi Join) and the hash join for the INNER JOIN shows a Hash Match (Inner Join)</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/01/InVsSelect-1.png"><img class="alignnone size-thumbnail wp-image-513" style="border: 1px solid black;" title="In Vs Select 1" src="http://sqlinthewild.co.za/wp-content/uploads/2010/01/InVsSelect-1-150x150.png" alt="In Vs Select 1" width="150" height="150" /></a></p>
<p>The IOs are the same and the durations are extremely similar. Here&#8217;s the IO results and durations for five tests.</p>
<p>IN</p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 14, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 156 ms,  elapsed time = 2502 ms.<br />
CPU time = 157 ms,  elapsed time = 2323 ms.<br />
CPU time = 156 ms,  elapsed time = 2555 ms.<br />
CPU time = 188 ms,  elapsed time = 2381 ms.<br />
CPU time = 203 ms,  elapsed time = 2312 ms.</p></blockquote>
<p>INNER JOIN</p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 14, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 125 ms,  elapsed time = 2922 ms.<br />
CPU time = 140 ms,  elapsed time = 2372 ms.<br />
CPU time = 188 ms,  elapsed time = 2530 ms.<br />
CPU time = 203 ms,  elapsed time = 2323 ms.<br />
CPU time = 187 ms,  elapsed time = 2512 ms.</p></blockquote>
<p>Now let&#8217;s try with some indexes on the join columns.</p>
<pre class="brush: sql; title: ; notranslate">CREATE INDEX idx_BigTable_SomeColumn ON BigTable (SomeColumn)
CREATE INDEX idx_SmallerTable_LookupColumn ON SmallerTable (LookupColumn)</pre>
<p>Now when I run the two queries, the execution plans are different, and the costs of the two are no longer 50%. Both do a single index scan on each table, but the IN has a Merge Join (Inner Join) and the INNER JOIN has a Hash Match (Inner Join)</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/01/InVsSelect-2.png"><img class="alignnone size-thumbnail wp-image-514" style="border: 1px solid black;" title="InVsSelect 2" src="http://sqlinthewild.co.za/wp-content/uploads/2010/01/InVsSelect-2-150x150.png" alt="InVsSelect 2" width="150" height="150" /></a></p>
<p>The IOs are still identical, other than the WorkTable that only appears for the Hash Join.</p>
<p>IN</p>
<blockquote><p>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 14, physical reads 0.</p></blockquote>
<p>INNER JOIN</p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 14, physical reads 0.</p></blockquote>
<p>So what about the durations? Honestly it&#8217;s hard to say anything completely conclusive, the durations of both queries are quite small and they are very close. To see if there is any measurable different, I&#8217;m going to run each one 100 times, use Profiler to log the duration and CPU and then average the results over the 100 executions. While running this, I&#8217;m also going to close/disable everything else I can on the computer, to try and get reasonably accurate times.</p>
<p>IN</p>
<p>Average CPU: 130.<br />
Avg duration: 2.78 seconds</p>
<p>INNER JOIN</p>
<p>Average CPU: 161.<br />
Avg duration: 2.93 seconds</p>
<p>Now is that enough to be significant? I&#8217;m not sure. However, looking at those results along with the IO and execution plans, I do have a recommendation for In vs Inner Join</p>
<p>If all you need is to check for matching rows in the other table but don&#8217;t need any columns from that table, use IN. If you do need columns from the second table, use Inner Join.</p>
<p>I still intend to go over <a href="http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/">NOT IN and NOT EXISTS</a> and, after this one, I also want to take a look at the <a href="http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/">LEFT JOIN with IS NULL check vs NOT EXISTS</a> for when you want rows from Table1 that don&#8217;t have a match in Table 2.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>EXISTS vs IN</title>
		<link>http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/</link>
		<comments>http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 14:01:37 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=287</guid>
		<description><![CDATA[This one&#8217;s come up a few times recently, so I&#8217;ll take a look at it. The EXISTS and IN clauses at first glance look fairly similar. They both use a subquery to evaluate rows, but they do it in a slightly different way IN does a direct match between the column specified before the IN [...]]]></description>
			<content:encoded><![CDATA[<p>This one&#8217;s come up a few times recently, so I&#8217;ll take a look at it.</p>
<p>The EXISTS and IN clauses at first glance look fairly similar. They both use a subquery to evaluate rows, but they do it in a slightly different way</p>
<p>IN does a direct match between the column specified before the IN keyword and the values returned by the subquery. When using IN there can only be a single column specified in the select clause of the subquery</p>
<p>Let&#8217;s have a look at a quick example</p>
<pre class="brush: sql; title: ; notranslate">DECLARE @SomeTable (IntCol int)
Insert into @SomeTable (IntCol) Values (1)
Insert into @SomeTable (IntCol) Values (2)
Insert into @SomeTable (IntCol) Values (3)
Insert into @SomeTable (IntCol) Values (4)
Insert into @SomeTable (IntCol) Values (5)

SELECT *
FROM BigTable
WHERE SomeColumn IN (Select IntCol FROM @SomeTable)</pre>
<p>So this query returns all the rows from BigTable where SomeColumn has any of the values returned by the subquery, that is 1,2,3,4 or 5</p>
<p>But what if there were duplicate values returned by the subquery?</p>
<p><span id="more-287"></span>Well, it actually doesn&#8217;t matter. All that SQL is looking for is what values the subquery returns to process the filter. It&#8217;s not joining the two resultsets together so it makes no difference to the results if there are duplicate values returned by the subquery.</p>
<p>To put it more technically, SQL&#8217;s doing a semi-join, a join that can only eliminate or qualify rows from the first table, but cannot duplicate them.</p>
<p>So that&#8217;s IN. What about EXISTS</p>
<p>Exists doesn&#8217;t check for a match, it doesn&#8217;t care in the slightest what values are been returned from the expression, it just checks for whether a row exists or not. Because of that, if there&#8217;s no predicate in the WHERE clause of the subquery that compares rows in the subquery with rows in the outer query, EXISTS will either return true for all the rows in the outer query or it will return false for all the rows in the outer query</p>
<pre class="brush: sql; title: ; notranslate">DECLARE @SomeTable (IntCol int)
Insert into @SomeTable (IntCol) Values (1)
Insert into @SomeTable (IntCol) Values (2)
Insert into @SomeTable (IntCol) Values (3)
Insert into @SomeTable (IntCol) Values (4)
Insert into @SomeTable (IntCol) Values (5)

SELECT *
FROM BigTable
WHERE EXISTS (Select IntCol FROM @SomeTable)</pre>
<p>This will also return every single row in BigTable, because Select IntCol FROM @SomeTable returns 5 rows and hence the EXISTS predicate is always true.</p>
<p>Hence, to use EXISTS to do the same kind of thing as IN, there must be a correlation predicate within the subquery</p>
<pre class="brush: sql; title: ; notranslate">DECLARE @SomeTable (IntCol int)
Insert into @SomeTable (IntCol) Values (1)
Insert into @SomeTable (IntCol) Values (2)
Insert into @SomeTable (IntCol) Values (3)
Insert into @SomeTable (IntCol) Values (4)
Insert into @SomeTable (IntCol) Values (5)

SELECT *
FROM BigTable bt
WHERE EXISTS (Select IntCol FROM @SomeTable st WHERE bt.SomeColumn = st.IntCol)</pre>
<p>Now this will behave like the IN because it&#8217;s checking for matching rows and only returning true when there is a match. This will return all the rows from BigTable where SomeColumn has values 1,2,3,4 or 5 because those are the</p>
<p>Exists is better for when comparisons are needed on two or more columns. For eg, this cannot be done easily with an IN</p>
<pre class="brush: sql; title: ; notranslate">DECLARE @SomeTable (IntCol int, charCol char(1))
Insert into @SomeTable (IntCol, charCol) Values (1, 'a')
Insert into @SomeTable (IntCol, charCol) Values (2, 'a')
Insert into @SomeTable (IntCol, charCol) Values (3, 'a')
Insert into @SomeTable (IntCol, charCol) Values (4, 'b')
Insert into @SomeTable (IntCol, charCol) Values (5, 'b')

SELECT *
FROM BigTable bt
WHERE EXISTS (Select IntCol FROM @SomeTable st
WHERE bt.SomeColumn = st.IntCol AND bt.SomeOtherColumn = st.charCol)</pre>
<p>So that covers how they work, but how do they perform in comparison with each other? To answer that question, first I need some fairly large tables.</p>
<pre class="brush: sql; title: ; notranslate">Create Table BigTable (
id int identity primary key,
SomeColumn char(4),
Filler char(100)
)

Create Table SmallerTable (
id int identity primary key,
LookupColumn char(4),
SomeArbDate Datetime default getdate()
)

INSERT INTO BigTable (SomeColumn)
SELECT top 250000
char(65+FLOOR(RAND(a.column_id *5645 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *3784 + b.object_id)*12)) +
char(65+FLOOR(RAND(b.column_id *6841 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *7544 + b.object_id)*8))
from master.sys.columns a cross join master.sys.columns b

INSERT INTO SmallerTable (LookupColumn)
SELECT DISTINCT SomeColumn
FROM BigTable TABLESAMPLE (25 PERCENT)
-- (3955 row(s) affected)
</pre>
<p>Let&#8217;s first try without indexes and see how EXISTS and IN compare.</p>
<pre class="brush: sql; title: ; notranslate">-- Query 1
SELECT ID, SomeColumn FROM BigTable
WHERE SomeColumn IN (SELECT LookupColumn FROM SmallerTable)

-- Query 2
SELECT ID, SomeColumn FROM BigTable
WHERE EXISTS (SELECT LookupColumn FROM SmallerTable WHERE SmallerTable.LookupColumn = BigTable.SomeColumn)</pre>
<p>The first thing to note is that the execution plans are identical. Two clustered index scans and a hash join (right semi-join).</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2009/08/exists1.png"><img class="alignnone size-thumbnail wp-image-297" title="Exists vs IN" src="http://sqlinthewild.co.za/wp-content/uploads/2009/08/exists1-150x150.png" alt="" width="150" height="150" /></a></p>
<p>The IOs are also identical.</p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 15, physical reads 0.</p></blockquote>
<p>So these two queries are executed by SQL in exactly the same way. No performance differences here.</p>
<p>Now, let me add indexes to both tables, on that join column and see what changes.</p>
<pre class="brush: sql; title: ; notranslate">CREATE INDEX idx_BigTable_SomeColumn
ON BigTable (SomeColumn)
CREATE INDEX idx_SmallerTable_LookupColumn
ON SmallerTable (LookupColumn)</pre>
<p>With those created, I&#8217;m going to run the above two queries again. Again the execution plans of the two are identical, though the hash join and clustered index scans are gone, replaced by index scans, a stream aggregate and a merge join (inner join)</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2009/08/exists2.png"><img class="alignnone size-thumbnail wp-image-298" title="Exists vs IN" src="http://sqlinthewild.co.za/wp-content/uploads/2009/08/exists2-150x150.png" alt="" width="150" height="150" /></a></p>
<p>The IOs are again identical and the execution times very close.</p>
<blockquote><p>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 343, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 8, physical reads 0.</p></blockquote>
<p>So IN and EXISTS appear to perform identically both when there are no indexes on the matching columns and when there are, and this is true regardless of whether of not there are nulls in either the subquery or in the outer table.</p>
<p>Next up, a look at how <a href="http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/">IN compares to Inner Join</a> for the purposes of finding matching rows</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Functions, IO statistics and the Execution plan</title>
		<link>http://sqlinthewild.co.za/index.php/2009/04/29/functions-io-statistics-and-the-execution-plan/</link>
		<comments>http://sqlinthewild.co.za/index.php/2009/04/29/functions-io-statistics-and-the-execution-plan/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 21:21:49 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=231</guid>
		<description><![CDATA[It&#8217;s no secret that I&#8217;m not overly fond of most user-defined functions. This isn&#8217;t just a pet hate, I have some good reasons for disliking them. All too often they&#8217;re performance bottlenecks, but that can be said about many things in SQL. The bigger problem is that they&#8217;re hidden performance bottlenecks that often go overlooked [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s no secret that I&#8217;m not overly fond of most user-defined functions. This isn&#8217;t just a pet hate, I have some good reasons for disliking them. All too often they&#8217;re performance bottlenecks, but that can be said about many things in SQL. The bigger problem is that they&#8217;re hidden performance bottlenecks that often go overlooked and ignored for too long.</p>
<p>I&#8217;m going to start with this fairly simple scalar function, created in the AdventureWorks database</p>
<pre class="brush: sql; title: ; notranslate">Create function LineItemTotal(@ProductID int)
returns money
as
begin
declare @Total money

select @Total = sum(LineTotal) from sales.SalesOrderDetail where productid = @ProductID

return @Total
end</pre>
<p>So, given that function, the following two queries should be equivalent.</p>
<pre class="brush: sql; title: ; notranslate">SELECT productid, productnumber, dbo.LineItemTotal(productid) as SumTotal
FROM Production.Product p

SELECT productid, productnumber,
(select sum(LineTotal) from sales.SalesOrderDetail where productid = p.productid) AS SumTotal
FROM Production.Product p</pre>
<p>No problems so far. They both return 504 rows (in my copy of AW, which has been slightly padded out with more data). Now, let&#8217;s look at the execution characteristics by running them again with Statistics IO and Statistics Time on.</p>
<p>Query 1, the one with the scalar function:</p>
<blockquote><p>Table &#8216;Product&#8217;. Scan count 1, logical reads 4, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 47297 ms,  elapsed time = 47541 ms.</p></blockquote>
<p>Query 2, the one with the correlated subquery:</p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;SalesOrderDetail&#8217;. Scan count 3, logical reads 22536, physical reads 0.<br />
Table &#8216;Product&#8217;. Scan count 3, logical reads 40, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 1047 ms,  elapsed time = 1249 ms.</p></blockquote>
<p><span id="more-231"></span>There are two things to note here. Firstly the execution time. While the query with a correlated subquery used just over a second of CPU time, the query with the function used close to a minute of CPU time. But take a look at the IO statistics, not so much for what&#8217;s there, but for what&#8217;s not there. In the IO stats for the query with the user-defined function, there&#8217;s no mention of the SalesOrderDetail table at all.</p>
<p>The output of Statistics IO only shows the IO characteristics of the outer query, not of the function, so from here we have absolutely no idea what kind of IO impact that function is causing.</p>
<p>So that&#8217;s one problem, now what about the execution plan? From the CPU and elapsed times, it&#8217;s pretty clear that the query with the function is far more expensive than the query without.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2009/04/functionexecplan.png"><img class="alignnone size-medium wp-image-237" style="border: 1px solid black;" title="functionexecplan" src="http://sqlinthewild.co.za/wp-content/uploads/2009/04/functionexecplan-300x104.png" alt="" width="300" height="104" /></a></p>
<p>The costings on those plans are so far off as to be completely useless. The query that takes 1 sec of CPU time is apparently 100% of the cost of the batch and the one that takes 47sec of CPU time is apparently 0% of the cost of the batch. Somehow I don&#8217;t think that&#8217;s right. There&#8217;s also no indication whatsoever as to what the funtion is doing. The user-defined function is represented by the Compute Scalar operator in the first plan.</p>
<p><img class="alignnone size-full wp-image-238" title="functionexecplan2" src="http://sqlinthewild.co.za/wp-content/uploads/2009/04/functionexecplan2.png" alt="" width="314" height="364" /></p>
<p>Again, there&#8217;s no indication at all as to the true cost of that function. Estimated I/O cost of 0? I somehow doubt it.</p>
<p>This is the main reason I have a problem with user-defined functions, the way they interact with both the query stats and the execution plan makes it very difficult to see what their impact really is when one&#8217;s doing performance testing. Someone who&#8217;s not very familiar with the intricacies and nuances of SQL, or who&#8217;s just using the exec plan without examining anything else may mistakenly conclude that user-defined functions perform well.</p>
<p>So, how do you actually see what a query that uses a function is actually doing? The only way is to haul out profiler and run a trace.</p>
<p>I&#8217;m going to start with just a trace on T-SQL:BatchCompleted, and I&#8217;m going to run those two queries separately to see if I can get a more accurate picture of the IO impact.</p>
<p>The profiler trace shows a very different picture in terms of IOs than Statistics IO did.<br />
<img class="alignnone size-full wp-image-239" title="functionprofiler1" src="http://sqlinthewild.co.za/wp-content/uploads/2009/04/functionprofiler1.png" alt="" width="501" height="43" /><br />
20000 reads for the query with the subquery, 4 and a half million for the query with the function. Ouch.</p>
<p>But wait, there&#8217;s more.</p>
<p>I&#8217;m going to run that profiler trace again, but this time I&#8217;m going to include the SP:Completed event.<br />
<a href="http://sqlinthewild.co.za/wp-content/uploads/2009/04/functionprofiler21.png"><img class="alignnone size-full wp-image-241" title="functionprofiler21" src="http://sqlinthewild.co.za/wp-content/uploads/2009/04/functionprofiler21.png" alt="" width="500" height="411" /></a></p>
<p>Running the query with the function resulted in 505 events picked up by profiler. One was the batch completed as the entire query completed its execution. the other 504 were all SP:Completed events. As I noted earlier, the query that I&#8217;m running here returns 504 rows.</p>
<p>The function is being executed once for each row in the query. That&#8217;s why the duration of the query is so high (each execution of the function takes between 70 and 140 ms) and it&#8217;s why the reads are so exceedingly high. Each time the function executes it&#8217;s (in my case) doing a table scan of the SalesOrderDetail table (I have no index on ProductID). If an index is added on that column, the performance of the function becomes much better, but that may not always be the case.</p>
<p>Now this was a fairly simple function. Imagine if the function was a few hundred lines of T-SQL with multiple queries in it. Not a pleasant thought.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2009/04/29/functions-io-statistics-and-the-execution-plan/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
	</channel>
</rss>

