<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SQL in the Wild &#187; Indexes</title>
	<atom:link href="http://sqlinthewild.co.za/index.php/category/sql-server/indexes/feed/" rel="self" type="application/rss+xml" />
	<link>http://sqlinthewild.co.za</link>
	<description>A discussion on SQL Server</description>
	<lastBuildDate>Wed, 25 Apr 2012 14:45:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>SQL University: Advanced Indexing &#8211; Indexing Strategies</title>
		<link>http://sqlinthewild.co.za/index.php/2011/11/11/sql-university-advanced-indexing-indexing-strategies/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/11/11/sql-university-advanced-indexing-indexing-strategies/#comments</comments>
		<pubDate>Fri, 11 Nov 2011 15:00:00 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Indexes]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=1325</guid>
		<description><![CDATA[Right, I know it&#8217;s Friday and everyone&#8217;s tired and looking forward to the weekend, but I do need to finish off this indexing section and I&#8217;ll try to keep this short and interesting and hopefully keep everyone awake. There&#8217;s no shortage of information available on how to create indexes. Hell, I&#8217;ve written a copious amount [...]]]></description>
			<content:encoded><![CDATA[<p>Right, I know it&#8217;s Friday and everyone&#8217;s tired and looking forward to the weekend, but I do need to finish off this indexing section and I&#8217;ll try to keep this short and interesting and hopefully keep everyone awake.</p>
<p>There&#8217;s no shortage of information available on how to create indexes. Hell, I&#8217;ve written a copious amount myself. Most of these many articles however are written from the point of indexing single queries. What you chose for a where clause, what has to go into the include to create the perfect index for this query. Now that&#8217;s all well and good, but I&#8217;ve never met a system that had only one query per table (maybe there is such a system out there, but I&#8217;ve never found it)</p>
<p>So what I&#8217;m going to try to do today is address the topic of a strategy for indexing. How to approach indexing, not for a single query, but for the system as a whole. I won&#8217;t be able to cover this in-depth, this is material worthy of an entire book chapter, if not an entire book, but I can at least touch on the essential portions.</p>
<p>Now, there&#8217;s two main positions that we could be in when considering indexing strategies for an entire system<br />
1) A brand new system that&#8217;s still in development<br />
2) An existing system that&#8217;s being used actively.</p>
<p>One at a time&#8230;</p>
<h3>Indexing strategies for a brand new system</h3>
<p>Start by choosing a good clustered index. What makes a good clustered index? Well, it depends <img src='http://sqlinthewild.co.za/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<ul>
<li><a title="http://www.scarydba.com/2011/04/04/sql-universityrecommendations-for-a-clustered-index/" href="http://www.scarydba.com/2011/04/04/sql-universityrecommendations-for-a-clustered-index/">http://www.scarydba.com/2011/04/04/sql-universityrecommendations-for-a-clustered-index/</a></li>
<li><a title="http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx" href="http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx">http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx</a></li>
<li><a title="http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-Continues.aspx" href="http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-Continues.aspx">http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-Continues.aspx</a></li>
<li><a title="http://www.sqlskills.com/BLOGS/KIMBERLY/post/Ever-increasing-clustering-key-the-Clustered-Index-Debateagain!.aspx" href="http://www.sqlskills.com/BLOGS/KIMBERLY/post/Ever-increasing-clustering-key-the-Clustered-Index-Debateagain!.aspx">http://www.sqlskills.com/BLOGS/KIMBERLY/post/Ever-increasing-clustering-key-the-Clustered-Index-Debateagain!.aspx</a></li>
<li><a title="http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-again!.aspx" href="http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-again!.aspx">http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-again!.aspx</a></li>
<li><a title="http://www.sqlservercentral.com/articles/Indexing/68563/" href="http://www.sqlservercentral.com/articles/Indexing/68563/">http://www.sqlservercentral.com/articles/Indexing/68563/</a></li>
<li><a title="http://technet.microsoft.com/en-us/sqlserver/gg508879.aspx" href="http://technet.microsoft.com/en-us/sqlserver/gg508879.aspx">http://technet.microsoft.com/en-us/sqlserver/gg508879.aspx</a> (video)</li>
</ul>
<p>The clustered index is the base, it will affect each and every nonclustered index, and it&#8217;s not trivial to change once the system is in use, so chose carefully. I&#8217;m not saying another word on the subject of a clustered index, not today.</p>
<p>Once that&#8217;s done…</p>
<p><span id="more-1325"></span></p>
<p>Design any unique constraints, unique indexes or primary key constraints that are required by the database design. If the primary key should go on the column(s) that were chosen as the clustering key, great, the primary key gets created clustered. If not, then the clustered index goes on the column(s) chosen for the clustered index and the primary key gets created as nonclustered.</p>
<p>Index the foreign keys. These may end up not being the final indexes on these columns, but it&#8217;s an excellent place to start and it&#8217;s something that&#8217;s left out far, far too often.</p>
<p>That&#8217;s the absolute bare minimum that must be done. That can all be done with just the database design. The rest is going to require some knowledge of the queries that will be running against the server.</p>
<p>Speak to the developers, see what queries they&#8217;re going to be sending to the database, speak to the business analysts (or users) and see what think will be the commonly used aspects. Bear in mind that both the developers and business analysts (or users) may well be wrong. Not intentionally wrong, but wrong because they&#8217;re looking at things from a different perspective.</p>
<p>To give an example of that, a system I worked on some time back had custom security built-in to the DB (tables storing access rights to various sets of data). The users swore that the most accessed portion of the system was the address book. The developers claimed that the account balances procedure would have the heaviest impact. A trace showed that the custom security ran far more frequently than anything else.</p>
<p>Hence, if you can trace a workload (from automated testing, user testing or acceptance testing is the best) you should. Combine that with what the developers and business analysts say and that should be reasonably accurate.</p>
<p>Find the most critical queries, the ones that are going to run often. These may be &#8216;housekeeping&#8217; queries like the custom security, or maybe they&#8217;ll be queries run when the user opens the app. It&#8217;s going to differ for everyone.</p>
<p>Create a minimal set of indexes to support the most critical, most frequent queries. Do not, at this point, try to index everything. It&#8217;s going to be a waste of time without accurate stats of how the users really use the system. Create just a minimum of indexes to start with. You want enough so that the system runs acceptably but not so many that you&#8217;ll be cleaning unused indexes off for months after the system goes live.</p>
<p>I&#8217;m not going to go into detail here on how to create indexes, see all the links that I gave in the <a href="http://sqlinthewild.co.za/index.php/2011/11/07/sql-university-advanced-indexing-sorting-and-grouping/">first part of this series</a>. There are, however, a few things to keep in mind</p>
<ul>
<li>Fewer, wide indexes are better in general than lots of narrow indexes (<a title="http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/" href="http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/">http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/</a>)</li>
<li>Selectivity should not be the first thing you consider when choosing column order for indexes, especially indexes that are going to support multiple queries. (<a title="http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/" href="http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/">http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/</a> and <a title="http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/" href="http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/">http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/</a>)</li>
</ul>
<p>Once done, come back to the indexes after the system is in use and re-evaluate.</p>
<h3>Indexing strategies for an established system</h3>
<p>This one depends heavily on how badly the system is indexed.</p>
<p>If tables are missing clustered indexes or primary keys that should be the first priority. This is harder than for a new system as, in the absence of enforced constraints, duplicate values could have crept into the supposedly unique columns.</p>
<p>If primary keys can&#8217;t easily be added (due to errant data), the clustered indexes should still be considered. All the same considerations as for a new system apply here. It&#8217;s worth noting that adding clustered indexes to huge tables is not a quick exercise.</p>
<p>There are two factors to fixing the indexing for an existing system</p>
<ul>
<li>creating or widening indexes to support the current workload</li>
<li>removing indexes that are not used</li>
</ul>
<p><span style="font-weight: bold;">Creating or widening indexes</span></p>
<p>This should be done based on the workload. Either SQLTrace or Extended Events can be used to capture the queries running against the server. This can be examined manually or it can be submitted to the Database Tuning Adviser (DTA).</p>
<p>If using DTA, that must not be the entire story. DTA recommendations have to be carefully examined and tested and implemented only if they make sense (and improve performance). Be wary of accidentally creating redundant indexes this way (<a href="http://www.sqlskills.com/BLOGS/KIMBERLY/post/UnderstandingDuplicateIndexes.aspx">How can you tell if an index is REALLY a duplicate?</a>)</p>
<p>If manually tuning, the same cautions as mentioned in the &#8216;Indexing a new system&#8217; apply.</p>
<p>Also, be aware that you can&#8217;t create perfect indexes for all queries (except maybe in a data warehouse, but likely not even then). Index for the important (frequent or high priority) queries, create indexes that can support multiple queries. Tune the system, not the individual query. See <a href="http://www.sqlskills.com/BLOGS/KIMBERLY/post/Indexes_JustBecauseUCan_NO.aspx">Indexes: just because you can, doesn&#8217;t mean you should!</a> (Kimberly Tripp).</p>
<p>If the less important, less frequently run queries aren&#8217;t quite as optimal as they could be (but are still in the acceptable range), that&#8217;s fine. Be very careful of over-indexing.</p>
<p><strong>Removing indexes</strong></p>
<p>To be honest, this is fraught with peril. Telling that an index is unused is not as easy as it may seem.</p>
<p>Start with the sys.dm_db_index_usage_stats DMV. Indexes that have no seeks, no scans and just updates, or ones that don&#8217;t appear in there at all may appear to be unused. Whether they are really unused however is another question.</p>
<p>The sys.dm_db_index_usage_stats DMV is cleared by a restart of SQL or any time the database is closed (beware auto_close). So if the SQL instance has only been running for three days, then the best that can be said about indexes that appear in sys.dm_db_index_usage_stats with no seeks and no scans is that they haven&#8217;t been used in three days.</p>
<p>Before deciding to drop any indexes monitor that DMV over a period of time. How long? Depends on your application. If it&#8217;s an app that has a steady and consistent usage, maybe not long. If it&#8217;s an app that has radically different usage patterns at different times of the month/year, then long enough that you capture them all.</p>
<p>Also, make sure you keep documentation and preferably scripts of any indexes dropped, so that you can easily recreate them should it be necessary.</p>
<h3>In conclusion</h3>
<p>Comprehensive indexing strategies are not exactly easy things to write short posts on, but I hope this has given some idea on how to fit all the pieces together. For another view on this topic, be sure to watch Kimberly&#8217;s video on indexing strategies: <a href="http://technet.microsoft.com/en-us/sqlserver/gg545006.aspx">http://technet.microsoft.com/en-us/sqlserver/gg545006.aspx</a></p>
<p>Oh, I almost forgot the answer to Wednesday&#8217;s homework (for anyone that&#8217;s still awake). There are likely many answers, but this one satisfies the requirements:</p>
<pre class="brush: sql; title: ; notranslate">CREATE NONCLUSTERED INDEX idx_HomeworkAnswer
ON dbo.CallLog (Severity, LastUpdateDate, CallStatus)
INCLUDE (AssignedTo, LogDate)
WHERE LastUpdateDate IS NOT NULL AND Severity IN (1,2)</pre>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/11/11/sql-university-advanced-indexing-indexing-strategies/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>SQL University: Advanced Indexing &#8211; Filtered Indexes</title>
		<link>http://sqlinthewild.co.za/index.php/2011/11/09/sql-university-advanced-indexing-filtered-indexes-2/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/11/09/sql-university-advanced-indexing-filtered-indexes-2/#comments</comments>
		<pubDate>Wed, 09 Nov 2011 14:30:00 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Indexes]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=1351</guid>
		<description><![CDATA[Welcome back to day 2 of Advanced Indexing. Today we&#8217;re going to look at a feature that was added in SQL Server 2008 &#8211; filtered indexes. In versions previous, indexes were always on the entire table. An index would always have the same number of rows as the table it was built on did (which [...]]]></description>
			<content:encoded><![CDATA[<p>Welcome back to day 2 of Advanced Indexing. Today we&#8217;re going to look at a feature that was added in SQL Server 2008 &#8211; filtered indexes.</p>
<p>In versions previous, indexes were always on the entire table. An index would always have the same number of rows as the table it was built on did (which is why COUNT(*) can just scan the smallest index on the table)</p>
<p>With filtered indexes, it&#8217;s possible to have an index that&#8217;s built on a subset of the rows in the table. The definition for a filtered index contains a WHERE clause predicate that determines if a row in the table will be in the index or not.</p>
<p>This can be a major advantage on really large tables where most queries are only interested in a small fraction of the table. A normal index would be based on the entire table regardless of the fact that most of the table is of no interest, meaning the index would be larger than necessary, deeper than necessary and take up more space than would be ideal. With a filtered index on just the interesting portion of the table, the index size is kept to a minimum, meaning it&#8217;s shallower than an index on the entire table and hence more efficient.</p>
<p>A simple example of a filtered index would be</p>
<pre class="brush: sql; title: ; notranslate">CREATE NONCLUSTERED INDEX idx_Example
ON Account (AccountNumber)
WHERE Active = 1;</pre>
<p>There are two main uses of a filtered index:<br />
1) Enforcing moderately complex uniqueness requirements<br />
2) Supporting queries that filter on common subsets of a table</p>
<h3>Filtered indexes and unique columns</h3>
<p>One very interesting use of filtered indexes is in enforcing uniqueness over portions of a table. One requirement that come up again and again is to have a nullable column that must have unique entries in it, but whose entries are optional. Basically, the column must be unique or null. Sounds easy, but the problem is that a unique index allows only one null. So much for nulls not being equal to anything including other nulls.</p>
<p>Prior to SQL 2008 implementing such a constraint required computed columns, indexed views or triggers. With SQL 2008&#8242;s filtered indexes, it&#8217;s trivial.</p>
<p><span id="more-1351"></span></p>
<pre class="brush: sql; title: ; notranslate">CREATE UNIQUE NONCLUSTERED INDEX idx_SomeTable_SomeColumn
ON SomeTable (SomeColumn)
WHERE SomeColumn IS NOT NULL;</pre>
<p>It has to be a unique index not a unique constraint as indexes can be filtered, constraints cannot.</p>
<p>This can be extended to various forms of moderately complex unique requirements and is certainly an improvement over using indexed views or complex calculated columns (or just trusting the application to do things right).</p>
<h3>Supporting queries</h3>
<p>The really interesting use of filtered indexes though is for supporting queries. Here filtered indexes are very useful in cases where a queries against a table frequently include a specific filter.</p>
<p>A couple common cases of this are table that flag rows as active or inactive and most queries are interested in only the active rows, or in a database design where deletes are logical (an IsDeleted column) and almost every query filters for rows not marked as deleted.</p>
<p>Let&#8217;s have a look at a couple examples here. I&#8217;m not using AdventureWorks because the database design doesn&#8217;t include these kinds of patterns. The table design is given at the end of the post and a SQLDataGenerator project file is attached.</p>
<p>First let&#8217;s look at a simple example. This table stores support calls, and this query is looking for all recent open calls logged to one of the support people.</p>
<pre class="brush: sql; title: ; notranslate">SELECT CallID, LogDate, AssignedTo
FROM dbo.CallLog AS cl
WHERE CallStatus = 'Open'
AND AssignedTo = 42
AND LogDate &gt; DATEADD(ww,-1,GETDATE());</pre>
<p>Now, we could create a normal nonclustered index with CallStatus, AssignedTo and LogDate in the key, but let&#8217;s say that while the AssignedTo and LogDate filters change for this query, the filter is always, always, always for CallStatus = &#8216;Open&#8217;. This table has around 200 open calls and 100000 closed calls. Creating an index with the closed calls as well is just wasting space and time, no one&#8217;s interested. So, what I can do is this:</p>
<pre class="brush: sql; title: ; notranslate">CREATE NONCLUSTERED INDEX idx_CallLog_AssignedToLogDate
ON dbo.CallLog (AssignedTo, LogDate)
WHERE CallStatus = 'Open';</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex1.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="FilteredIndex1" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex1_thumb.png" alt="FilteredIndex1" width="364" height="120" border="0" /></a></p>
<p>One thing to note here is that if the query filter exactly matches the index filter, the index doesn&#8217;t need to have that column as either a key or include column. It&#8217;s not being selected and the filter is entirely taken care of with the indexes filter.</p>
<p>It&#8217;s worth noting that there&#8217;s bug relating to this, specifically around filtered indexes with an IS NULL filter. See <a href="http://connect.microsoft.com/SQLServer/feedback/details/454744/filtered-index-not-used-and-key-lookup-with-no-output/" target="_blank">http://connect.microsoft.com/SQLServer/feedback/details/454744/filtered-index-not-used-and-key-lookup-with-no-output/</a> The bug is still unfixed in the latest CTP of SQL Server 2012.</p>
<p>Let&#8217;s have a look at a second example, where the query&#8217;s filter is a subset of the index&#8217;s filter.</p>
<p>Let&#8217;s say that a very frequent query is for the urgent or urgent and high priority calls (severity 1 and 2). So these are two common queries:</p>
<pre class="brush: sql; title: ; notranslate">SELECT CallID, LogDate, CallStatus, Severity
FROM dbo.CallLog AS cl
WHERE Severity &lt; 3 -- urgent and high
AND AssignedTo = 1;

SELECT CallID, LogDate, CallStatus, Severity
FROM dbo.CallLog AS cl
WHERE Severity = 1  -- urgent
AND AssignedTo = 1;</pre>
<p>So, given that, I can create a filtered index on the larger of those ranges</p>
<pre class="brush: sql; title: ; notranslate">CREATE NONCLUSTERED INDEX idx_CallLog_Severity
ON dbo.CallLog (AssignedTo,    Severity)
INCLUDE (CallStatus, LogDate)
WHERE Severity &lt; 3</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex2.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="FilteredIndex2" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex2_thumb.png" alt="FilteredIndex2" width="364" height="206" border="0" /></a></p>
<p>In this case the filtered index can be used, but the column that&#8217;s being filtered on must also be in the index key, because the second query is filtering for a subset of the rows that the index contains.</p>
<p>An examination of the properties of the index seeks shows that for the first query there&#8217;s only one seek predicate – AssignedTo, whereas for the second query there are two seek predicates – AssignedTo and Severity</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex3.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="FilteredIndex3" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex3_thumb.png" alt="FilteredIndex3" width="204" height="314" border="0" /></a> <a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex4.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="FilteredIndex4" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex4_thumb.png" alt="FilteredIndex4" width="194" height="314" border="0" /></a></p>
<p>There are some limitations around filtered indexes and the matching of the filters. There are cases where a query and a filtered index have predicates that are logically equivalent, but where the filtered index can&#8217;t be used.</p>
<p>An example of this is not hard to generate. Let&#8217;s try a table that has an IsDeleted bit column (defined as not nullable)</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE Users (
UserID INT IDENTITY PRIMARY KEY,
UserName VARCHAR(50),
DepartmentID INT NOT NULL,
IsDeleted BIT NOT NULL DEFAULT 0
);

CREATE NONCLUSTERED INDEX idx_Users_DepartmentID
ON dbo.Users (DepartmentID)
INCLUDE (UserName)
WHERE IsDeleted = 0;</pre>
<p>The IsDeleted column is a non-nullable bit column. Hence it can only have two possible values, 0 and 1. Hence, these two queries are completely equivalent in their results</p>
<pre class="brush: sql; title: ; notranslate">SELECT UserName, DepartmentID
FROM dbo.Users
WHERE IsDeleted = 0 AND DepartmentID = 3;

SELECT UserName, DepartmentID
FROM dbo.Users
WHERE IsDeleted != 1 AND DepartmentID = 3;</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex71.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="FilteredIndex7" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex7_thumb.png" alt="FilteredIndex7" width="364" height="157" border="0" /></a></p>
<p>The first one uses the filtered index, the second does not. The second one scans the clustered index because despite the filter essentially being the same as the index filter, it’s not the same and the query hence cannot use the filtered index.</p>
<p>Another limitation has to do with parametrisation. If the query is passed in a parametrised form, or is subject to simple or forced parametrisation, then by the time that the optimiser gets the query there may not be sufficient information to tell that a filtered index is usable or not.</p>
<p>If we imagine the case of the table with the IsDeleted column again, a query that has a filter IsDeleted = 0 is definitely capable of using a filtered index that has the predicate IsDeleted = 0, but if the query gets parametrised and arrives at the optimiser in the form IsDeleted = @p1, there&#8217;s no way that query can match the filtered index because the value of @p1 on future executions could be 0, 1, NULL or 42, and in any case other than 0, if the cached plan used the filtered index it would produce incorrect results.</p>
<p>We can see this by setting the database parametrisation to forced and re-running an earlier example.</p>
<p>With parametrisation simple and the index with a filter on severity &lt; 3, these two queries produce different execution plans</p>
<pre class="brush: sql; title: ; notranslate">SELECT CallID, LogDate, CallStatus, Severity
FROM dbo.CallLog AS cl
WHERE Severity = 1
AND AssignedTo = 1;

SELECT CallID, LogDate, CallStatus, Severity
FROM dbo.CallLog AS cl
WHERE Severity = 4
AND AssignedTo = 1;</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex5.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="FilteredIndex5" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex5_thumb.png" alt="FilteredIndex5" width="364" height="207" border="0" /></a></p>
<p>However if the database is set for forced parameterisation, then the query is only seen in its parametrised form, and both queries have the same plan, one that does not use the filtered index</p>
<pre class="brush: sql; title: ; notranslate">ALTER DATABASE Testing SET PARAMETERIZATION FORCED;
GO

SELECT CallID, LogDate, CallStatus, Severity
FROM dbo.CallLog AS cl
WHERE Severity = 1
AND AssignedTo = 1;

SELECT CallID, LogDate, CallStatus, Severity
FROM dbo.CallLog AS cl
WHERE Severity = 4
AND AssignedTo = 1;</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex6.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="FilteredIndex6" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndex6_thumb.png" alt="FilteredIndex6" width="364" height="192" border="0" /></a></p>
<p>Right, so, homework for today. Given the table design below and the following queries, design one filtered index that both queries can use effectively (a filtered index that doesn’t filter out any rows is not an acceptable answer). Assume those are fixed queries that are frequently run with exactly that structure and exactly those values.</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE CallLog (
CallID INT IDENTITY PRIMARY KEY,
CallStatus CHAR(6) NOT NULL,
LogDate DATETIME NOT NULL,
LastUpdateDate DATETIME,
Title VARCHAR(500),
Severity TINYINT,
AssignedTo INT,
UserID INT
);

SELECT CallID, CallStatus, AssignedTo
FROM dbo.CallLog
WHERE CallStatus = 'Open' AND LastUpdateDate IS NOT NULL AND Severity = 1;

SELECT CallID, LogDate, LastUpdateDate FROM dbo.CallLog
WHERE LastUpdateDate &lt; DATEADD(dd,7,GETDATE()) AND Severity IN (1,2);</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/FilteredIndexes.zip">FilteredIndexes</a> (SQL DataGenerator project)</p>
<p>Answers for Monday&#8217;s homework:</p>
<p>1) Yes, this can use an index for both filter and group by. Index key columns would be (TransactionType, ReferenceOrderID, ProductID), in that order, and index include columns would be (Quantity, ActualCost)</p>
<p>2) No, because of the inequality we can use an index to support filtering or aggregating but not both. So there will either be a plan with an index seek and a hash aggregate (or sort and stream aggregate) or a plan with an index scan and a stream aggregate, but there&#8217;s no way to get a seek and stream aggregate without a sort</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/11/09/sql-university-advanced-indexing-filtered-indexes-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SQL University: Advanced indexing &#8211; Sorting and Grouping</title>
		<link>http://sqlinthewild.co.za/index.php/2011/11/07/sql-university-advanced-indexing-sorting-and-grouping/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/11/07/sql-university-advanced-indexing-sorting-and-grouping/#comments</comments>
		<pubDate>Mon, 07 Nov 2011 14:30:00 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Indexes]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=1321</guid>
		<description><![CDATA[Good day everyone and welcome to another week of SQL University. I know we’re getting close to the end of the year and everyone’s looking forward to a nice long vacation soaking up the sun at the beach, but a little bit of attention would be nice. Thank you. This week is Advanced Indexing, and [...]]]></description>
			<content:encoded><![CDATA[<p>Good day everyone and welcome to another week of SQL University. I know we’re getting close to the end of the year and everyone’s looking forward to a nice long vacation soaking up the sun at the beach, but a little bit of attention would be nice. Thank you.</p>
<p>This week is Advanced Indexing, and I mean advanced, none of that selectivity, SARGable, predicate stuff that gets repeated all over the place. If you need a refresher on the basics before we get started, the following can be considered pre-requisite reading for this course</p>
<ul>
<li><a href="http://www.scarydba.com/2010/07/19/sql-university-indexes-part-the-first/">Introduction to Indexes, Part the First</a></li>
<li><a href="http://www.scarydba.com/2010/07/21/sql-university-introduction-to-indexes-part-the-second/">Introduction to Indexes, Part the Second</a></li>
<li><a href="http://www.scarydba.com/2010/07/23/sql-university-introduction-to-indexes-part-the-third/">Introduction to Indexes, Part the Third</a></li>
<li><a href="http://www.scarydba.com/2011/04/04/sql-universityrecommendations-for-a-clustered-index/">Recommendations for a Clustered Index</a></li>
<li><a href="http://www.scarydba.com/2011/04/06/sql-university-index-usage/">Index Usage</a></li>
</ul>
<p>There’s also some additional background material available for more enthusiastic students:</p>
<ul>
<li><a href="http://www.sqlservercentral.com/articles/Indexing/68439/">Introduction to Indexes</a></li>
<li><a href="http://www.sqlservercentral.com/articles/Indexing/68563/">Introduction to Indexes: Part 2 – The clustered index</a></li>
<li><a href="http://www.sqlservercentral.com/articles/Indexing/68636/">Introduction to Indexes: Part 3 – The nonclustered index</a></li>
<li><a href="http://technet.microsoft.com/en-us/sqlserver/gg508878.aspx">Index Internals</a> (Video)</li>
<li><a href="http://technet.microsoft.com/en-us/sqlserver/gg508879.aspx">The Clustered Index Debate</a> (Video)</li>
<li><a href="http://sqlserverpedia.com/wiki/Index_Selectivity_and_Column_Order">Index Selectivity and Column Order</a></li>
</ul>
<p>Right, now that the admin has been handled, let&#8217;s get straight into things. Nothing like starting at the deep end…</p>
<p>Most people would rightly associate indexes with where clause predicates and joins, after all, the main usage of an index is to reduce the rows in consideration for a query as fast as possible. However there’s another portion of your queries that indexes can, if appropriately designed, help with – grouping and sorting.</p>
<p>Sorting is an extremely expensive operation, especially on large numbers of rows. For the academics in the audience, the algorithmic complexity of sorting is above linear, the time to sort a set of data increases faster than the number of items in the list. The common <a href="http://en.wikipedia.org/wiki/Sorting_algorithm">sorting algorithms</a> have an average time complexity of O(n log n). It’s better than O(n<sup>2</sup>), but it can still hurt at the higher row counts.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/On2.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="O(n^2)" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/On2_thumb.png" alt="O(n^2)" width="214" height="204" border="0" /></a> <a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/Onlogn.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="O(n log n)" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/Onlogn_thumb.png" alt="O(n log n)" width="210" height="204" border="0" /></a></p>
<p>O(n<sup>2</sup>) on the left, O(n log n) on the right (Thanks to <a href="http://www.quickmath.com">QuickMath</a>)</p>
<p>Right, the non-academics can wake up now.</p>
<p>The other reason that sorting hurts is that it needs a memory grant to do the sort. If there’s a shortage of memory the query could have to wait a while for the memory to be granted and, if the optimiser mis-estimates the number of rows to be sorted, the memory grant could be insufficient and the sort would have to spill into TempDB. You don’t want that happening.</p>
<p>Finally, sort is a blocking operator in the execution plan (all rows must have been fetched from the source before any output can be returned), and so the client won’t start seeing rows returned until the entire sort has been completed. This can make the query feel like it takes longer than it really does.</p>
<p>Grouping and aggregation are much the same. To aggregate one set of values based on another set of values, SQL has to get all the like values of the grouping columns together so that it can do the aggregation. That sounds suspiciously like a sort doesn’t it?</p>
<p>SQL doesn’t always sort to do aggregations, but the alternative – hash tables – isn’t exactly free (homework exercise – read up on hash tables)</p>
<p>So for both sorting and grouping, the query processor’s job would be a lot easier if there was some way that it could get the data ordered by the sorting or grouping columns without having to do the work of actually sorting. Sounds impossible? No.</p>
<p><span id="more-1321"></span></p>
<p>Enter indexes. The b-tree structure of an index makes it great for quickly finding matching rows, but today we’re more interested in the leaf-level of the index than the upper levels. The leaf level of an index is logically ordered by the index key (not physically ordered, let’s stop that myth right here please).</p>
<p>The leaf level of an index is logically ordered by the index key columns. So, if the index leaf level is read, it can return the data in the order of the index key. If that order is what the query processor needs in order to process a sort or grouping, then there’s no need for an expensive sort to be done, the underlying order can be leveraged.</p>
<p>Now, before someone misquotes me and goes off crying out that data is always returned in the order of the index used, no it is not. If there’s no order by on a query, there’s no guarantee of order regardless of indexes. However if there is an order by, the query optimiser and query processor may be able to utilise the underlying index order and avoid the cost of actually sorting the data.</p>
<p>Ok, enough theory for now, let’s look at some practical examples. We’ll use AdventureWorks here and I’m going to use the TransactionHistoryArchive table (in the Product schema)</p>
<p>Firstly a simple (and silly) example:</p>
<pre class="brush: sql; title: ; notranslate">SELECT ProductID, TransactionDate, ActualCost
 FROM Production.TransactionHistoryArchive
 ORDER BY ActualCost</pre>
<p>This one’s not very realistic, but it’s a nice simple one to start with. Give me all the rows in the table ordered by the ActualCost column. There’s no filter here so many would say that indexes can’t help here, and it’s true that an index can’t help with finding rows (because all are required), but it can help.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/SortExample1.png"><img style="display: inline; border-width: 0px;" title="SortExample1" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/SortExample1_thumb.png" alt="SortExample1" width="484" height="160" border="0" /></a></p>
<blockquote><p>Table &#8216;TransactionHistoryArchive&#8217;. Scan count 1, logical reads 1419, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 421 ms,  elapsed time = 4855 ms.</p></blockquote>
<p>421ms of CPU time, with an estimated cost of 93% for the sort. Let’s see if we can make this any better.</p>
<p>If I create an index on ActualCost, that gives SQL the option of scanning the index (scan, because there’s no seek predicate) to retrieve the data in the order of the ActualCost column. I’m going to have to make it a covering index, as there is no way at all that SQL will willingly do key lookups for every single row of the table.</p>
<pre class="brush: sql; title: ; notranslate">CREATE NONCLUSTERED INDEX idx_TransactionHistoryArchive_ACtualCost
 ON Production.TransactionHistoryArchive (ActualCost)
 INCLUDE (ProductID, TransactionDate)</pre>
<p>Let’s see how that’s changed the query’s execution.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/SortExample2.png"><img style="display: inline; border-width: 0px;" title="SortExample2" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/SortExample2_thumb.png" alt="SortExample2" width="364" height="94" border="0" /></a></p>
<blockquote><p>Table &#8216;TransactionHistoryArchive&#8217;. Scan count 1, logical reads 682, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 47 ms,  elapsed time = 4073 ms.</p></blockquote>
<p>The reads have dropped slightly, because we’re now scanning a nonclustered index which is smaller than the clustered index, but that’s not the main point here. The CPU usage has dropped by around a factor of 8. 421ms down to 47ms. If this was a critical query that ran several times a minute then this change could make a nice improvement to throughput and overall CPU usage.</p>
<p>That was a silly example, let’s try for something a little more complex:</p>
<pre class="brush: sql; title: ; notranslate">SELECT  ProductID ,
TransactionDate ,
TransactionType ,
Quantity ,
ActualCost
FROM Production.TransactionHistoryArchive
WHERE TransactionType = 'S'
AND ReferenceOrderID = 51739
ORDER BY TransactionDate</pre>
<p>If we look at the execution plan, there’s already an index in use. There’s an index on ReferenceOrderID, but it’s clearly not as good as it could be.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/SortExample3.png"><img style="display: inline; border-width: 0px;" title="SortExample3" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/SortExample3_thumb.png" alt="SortExample3" width="484" height="144" border="0" /></a></p>
<blockquote><p>Table &#8216;TransactionHistoryArchive&#8217;. Scan count 1, logical reads 222, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 0 ms,  elapsed time = 1 ms.</p></blockquote>
<p>Well, it’s not exactly taking ages on the CPU (it&#8217;s a tiny resultset), but this can still be better than it is. The key lookup is there because the existing index is only on two columns – ReferenceOrderID and ReferenceOrderLineID. I can’t modify that index for this query without potentially breaking some other query’s use of it, so I’ll create a new index.</p>
<pre class="brush: sql; title: ; notranslate">CREATE NONCLUSTERED INDEX idx_TransactionTypeReferenceOrderID
ON Production.TransactionHistoryArchive (TransactionType, ReferenceOrderID)
INCLUDE (ProductID, TransactionDate, Quantity, ActualCost)</pre>
<p>Why TransactionType first? Come back on Friday and I’ll be discussing that.</p>
<p>Plan’s now much simpler, the key lookup has gone, and now the sort is the majority of the cost</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/SortExample4.png"><img style="display: inline; border-width: 0px;" title="SortExample4" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/SortExample4_thumb.png" alt="SortExample4" width="484" height="122" border="0" /></a></p>
<p>The key to getting rid of that sort is to understand that a multi-column index is sorted by the entire key. So, if we have an index on Col1, Col2, then for rows where Col1 has the same value, those rows are listed within the index ordered by Col2. In other words, if we read the leaf level of that index it would look like this:</p>
<pre>Col1      Col2
A         1
A         3
A         8
B         4
B         8
B         9
B         14
C         0
\C         1</pre>
<p>And so on and so on. Hence, if I filtered that by Col1 = B, the resulting rows could be returned ordered by Col2 with no additional work.</p>
<p>So in the above case, if I add the sort column (TransactionDate) as a key column in the index (it’s currently an Include column), then once the filter on TransactionType and ReferenceOrderID is done, the qualifying rows can be read in order of TransactionDate.</p>
<p>Let’s drop the index we created and create one with TransactionDate as an additional key column.</p>
<pre class="brush: sql; title: ; notranslate">CREATE NONCLUSTERED INDEX idx_TransactionTypeReferenceOrderID
ON Production.TransactionHistoryArchive (TransactionType, ReferenceOrderID, TransactionDate)
INCLUDE (ProductID, Quantity, ActualCost)</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/SortExample5.png"><img style="display: inline; border-width: 0px;" title="SortExample5" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/SortExample5_thumb.png" alt="SortExample5" width="364" height="100" border="0" /></a></p>
<p>Success, the sort has gone!</p>
<p>One last example, then we’ll take a brief look at group by before finishing up for the day.</p>
<pre class="brush: sql; title: ; notranslate">SELECT  ProductID ,
TransactionDate ,
TransactionType ,
Quantity ,
ActualCost
FROM Production.TransactionHistoryArchive
WHERE ActualCost &gt; 2500
ORDER BY TransactionDate</pre>
<p>Can I use the same technique here to remove the need for a sort?</p>
<p>Let&#8217;s look at that simplistic index that I described earlier and see how the inequality would play out.</p>
<pre>Col1      Col2
A         1
A         3
A         8
B         4
B         8
B         9
B         14
C         0
C         1</pre>
<p>If I filter that for Col1 &gt; &#8216;A&#8217;, are the resultant rows ordered by Col2?</p>
<p>No, because the filter on Col1 is an inequality, a filter that returns multiple different values of Col1, the results aren&#8217;t ordered by the second column and hence we can&#8217;t  use an index here to both support the filter and the order.</p>
<p>Given this one, we could create an index to support the filter and have SQL sort the rows that qualified or we could create an index to support the order by and have SQL scan that and filter out rows that don&#8217;t match.</p>
<p>In general, the first option is the one that will be best in the majority of cases. The primary use of an index is to locate rows and filter resultsets. There are cases where the alternative may be appropriate, if the vast majority of the rows in the table qualify for the filter then it may be more optimal to support the sort and let SQL scan and filter. Definitely not the usual case though.</p>
<p>That about wraps up sorts, now for a quick look at group by.</p>
<p>As mentioned earlier, SQL has two ways to process grouping, it can do what is called a Stream Aggregate or it can use a hash table and do a Hash Aggregate. Stream Aggregate requires that the resultset be sorted in the order of the grouping columns. Well, given that fact and that we&#8217;ve spend a lot of time showing how to use indexes to support a sort, this should be quick and easy.</p>
<p>Let&#8217;s dive straight into some examples, because the theory is the same as for the sort described earlier</p>
<pre class="brush: sql; title: ; notranslate">SELECT ProductID, SUM(ActualCost) AS TotalCostPerProduct
FROM Production.TransactionHistoryArchive
GROUP BY ProductID</pre>
<p>This currently executes as a hash aggregate.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/Grouping1.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="Grouping1" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/Grouping1_thumb.png" alt="Grouping1" width="364" height="84" border="0" /></a></p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0.</p>
<p>Table &#8216;TransactionHistoryArchive&#8217;. Scan count 1, logical reads 1419, physical reads 0.</p>
<p>SQL Server Execution Times:</p>
<p>CPU time = 47 ms,  elapsed time = 46 ms.</p></blockquote>
<p>If we want a stream aggregate, then we need to get the rows entering the aggregation ordered by ProductID. SQL is not going to willingly sort the entire resultset (the hash aggregate is cheaper), so the only way we&#8217;re going to get a stream aggregate (other than with a hint) is by adding an index so that the data can be read from the index already ordered. There&#8217;s no filter here so it&#8217;s a straightforward index</p>
<pre class="brush: sql; title: ; notranslate">CREATE NONCLUSTERED INDEX idx_TransactionHistoryArchive_ProductID
ON Production.TransactionHistoryArchive (ProductID)
INCLUDE (ActualCost)</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/Grouping2.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="Grouping2" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/Grouping2_thumb.png" alt="Grouping2" width="364" height="73" border="0" /></a></p>
<blockquote><p>Table &#8216;TransactionHistoryArchive&#8217;. Scan count 1, logical reads 482, physical reads 0.</p>
<p>SQL Server Execution Times:</p>
<p>CPU time = 31 ms,  elapsed time = 34 ms.</p></blockquote>
<p>Not a massive improvement, but the principle is there.</p>
<p>Now, for homework, take these two queries and see firstly if it is possible to create an index to support both the filter and the group by and, if so, identify what that index is.</p>
<p><strong>Question 1</strong></p>
<pre class="brush: sql; title: ; notranslate">SELECT  ReferenceOrderID
ProductID ,
SUM(Quantity) AS TotalQuantity ,
SUM(ActualCost) AS TotalCost
FROM Production.TransactionHistoryArchive
WHERE TransactionType = 'S'
 GROUP BY ReferenceOrderID, ProductID</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/Homework1.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="Homework1" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/Homework1_thumb.png" alt="Homework1" width="484" height="78" border="0" /></a></p>
<p><strong>Question 2</strong></p>
<pre class="brush: sql; title: ; notranslate">SELECT ReferenceOrderID ,MIN(ActualCost)|
FROM Production.TransactionHistoryArchive AS tha
WHERE TransactionDate &gt; '2004-01-01'
GROUP BY ReferenceOrderID</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/11/Homework2.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="Homework2" src="http://sqlinthewild.co.za/wp-content/uploads/2011/11/Homework2_thumb.png" alt="Homework2" width="484" height="78" border="0" /></a></p>
<p>Edit: And (as a late clarification) assume that the filter on transaction date is not always the same date.</p>
<p>Enough for today. Same time, same place Wednesday for a look at indexes on part of a table.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/11/07/sql-university-advanced-indexing-sorting-and-grouping/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Indexing for ORs</title>
		<link>http://sqlinthewild.co.za/index.php/2011/05/03/indexing-for-ors/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/05/03/indexing-for-ors/#comments</comments>
		<pubDate>Tue, 03 May 2011 14:00:00 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Indexes]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=961</guid>
		<description><![CDATA[All of the indexing strategy posts I&#8217;ve written in the past have been concerned with predicates combined with ANDs. That&#8217;s only one half of the possibilities though. There&#8217;s the case of predicates combines with ORs, and the guidelines for indexing that work well with ANDs don&#8217;t work with ORs When dealing with predicates combined with [...]]]></description>
			<content:encoded><![CDATA[<p>All of the indexing strategy posts I&#8217;ve written in the past have been concerned with predicates combined with ANDs. That&#8217;s only one half of the possibilities though. There&#8217;s the case of predicates combines with ORs, and the guidelines for indexing that work well with ANDs don&#8217;t work with ORs</p>
<p>When dealing with predicates combined with AND, the predicates are cumulative, each one operates to further reduce the resultset.</p>
<p>For this reason, multi-column indexes support multiple predicates combined with AND operators.</p>
<p>If we look at a quick example, consider the following.</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE Customers (
  CustomerID INT IDENTITY PRIMARY KEY,
  Surname VARCHAR(30) NOT NULL,
  FirstName VARCHAR(30),
  Title VARCHAR(5),
  CustomerType CHAR(1) NOT NULL,
  IsActive BIT DEFAULT 1 NOT NULL,
  RegistrationDate DATETIME NOT NULL DEFAULT GETDATE()
);&lt;/pre&gt;
CREATE INDEX idx_Customers_SurnameFirstName ON Customers (Surname, FirstName);</pre>
<p>Again I&#8217;m going to be lazy and get SQLDataGenerator to generate a few rows.</p>
<p>With that two column index on those columns and a query that looks for Surname = &#8216;Kelley&#8217; AND Name = &#8216;Rick&#8217;, SQL can do a double column seek to go directly to the start of the range then just read down the index to the end of the range, basically until it finds the first row that it&#8217;s not interested in.</p>
<p>So how does that that differ when the operator is an OR?</p>
<p>The main difference is that with an OR, the predicates are independent. The second doesn&#8217;t serve to reduce the recordset, but rather to expand it. It&#8217;s similar to evaluating two separate predicates and combining the result. Let&#8217;s have a look at that 2 column index again when the two predicates are combined with an OR.</p>
<pre class="brush: sql; title: ; notranslate">SELECT CustomerID
  FROM Customers
  WHERE Surname = 'Kelley' OR FirstName = 'Rick';</pre>
<p>If we try to use that index to evaluate Surname = &#8216;Kelley&#8217; OR Name = &#8216;Rick&#8217;, there&#8217;s a problem. While the first of those predicates can be evaluated with a seek (it&#8217;s a sargable predicate on the left-most column of an index), the second predicate cannot. It&#8217;s sargable, but it is on the second column of the index (and for the moment let&#8217;s assume there are no other indexes on the table). Seeks are only possible if the predicate filters on a left-based subset of the index key.</p>
<p>Hence to evaluate that predicate SQL will have to do an index scan. Since it has to do a scan to evaluate the one predicate, it won&#8217;t bother also doing a seek to evaluate the first predicate as it can also evaluate that during the scan.</p>
<p>Hence, in this case, the query will execute with a single index scan.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/04/IndexScanWithOr.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="IndexScanWithOr" src="http://sqlinthewild.co.za/wp-content/uploads/2011/04/IndexScanWithOr_thumb.png" border="0" alt="IndexScanWithOr" width="484" height="92" /></a></p>
<p>So how do we get this query to rather seek?</p>
<p><span id="more-961"></span>The key is that the predicates are independent, each get evaluated separately. Given that they are evaluated separately, it&#8217;s not a stretch to conclude that they perhaps need separate indexes, and that is indeed the case.</p>
<p>Taking the demo above and splitting that two column index into two separate indexes , the query now does execute with an index seek, or to be more correct, with two of them.</p>
<pre class="brush: sql; title: ; notranslate">CREATE NONCLUSTERED INDEX idx_Customers_Surname ON dbo.Customers (Surname);
CREATE NONCLUSTERED INDEX idx_Customers_FirstName ON dbo.Customers (FirstName);</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/04/IndexSeekWithOrs.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="IndexSeekWithOrs" src="http://sqlinthewild.co.za/wp-content/uploads/2011/04/IndexSeekWithOrs_thumb.png" border="0" alt="IndexSeekWithOrs" width="484" height="150" /></a></p>
<p>That&#8217;s the simple case dealt with. What about a more complex where clause, one with both AND and OR operators in it.</p>
<pre class="brush: sql; title: ; notranslate">SELECT CustomerID FROM dbo.Customers
  WHERE CustomerType = 'A'
    AND IsActive = 1
    AND (Surname = 'Kelley' OR FirstName = 'Rick');</pre>
<p>The easiest way to work this one out is to modify the form of that where clause. Boolean logic (specifically the distributivity) property states that a AND (b OR c) is equal to (a AND b) OR (a AND c)</p>
<p>Converted as such, the query now looks like this</p>
<pre class="brush: sql; title: ; notranslate">SELECT CustomerID FROM dbo.Customers
  WHERE (CustomerType = 'A' AND IsActive = 1 AND Surname = 'Kelley')
  OR
    (CustomerType = 'A' AND IsActive = 1 AND FirstName = 'Rick');</pre>
<p>Now that has the same pattern as the previous query, so is easy identify indexes. Each predicate (or set of predicates) combined with OR needs an index. There are two sets of predicates, so two indexes.</p>
<pre class="brush: sql; title: ; notranslate">CREATE INDEX idx_Customers_TypeActiveFirstName
ON dbo.Customers (CustomerType, IsActive, FirstName)
GO

CREATE INDEX idx_Customers_TypeActiveSurname
ON dbo.Customers (CustomerType, IsActive, Surname)
GO</pre>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/04/IndexSeekComplexOr.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="IndexSeekComplexOr" src="http://sqlinthewild.co.za/wp-content/uploads/2011/04/IndexSeekComplexOr_thumb.png" border="0" alt="IndexSeekComplexOr" width="484" height="146" /></a></p>
<p>Now I&#8217;m not suggesting that queries actually be written in that form. It&#8217;s more typing and can be confusing, and the query parser converts it back to the form a AND (b or c), but the query can be imagined in that form to make the indexes easier to work out.</p>
<p>The column order I&#8217;ve used there is not a requirement, the query seeks just as well when the order of the column in one or both indexes are different. The order is more determined by other queries against the table that may be able to use one or both.</p>
<p>One more thing to look at, and that&#8217;s the case where the index doesn&#8217;t cover the query. As I&#8217;m sure most of us know, if the index is not covering and the predicate not highly selective, SQL is likely to ignored the index in favour of scanning the clustered index (or heap).</p>
<p>The same thing applies here, with one complication. Typically (well, in all the cases I tested) it was necessary for both indexes to be covering or SQL just goes off and scans the cluster, even when one or both predicates have very low row estimates (like 1 row). In fact, using hints to try and force two seeks (with key lookups) results in an error</p>
<pre class="brush: sql; title: ; notranslate">CREATE INDEX idx_Customers_IsActive
ON dbo.Customers (IsActive) INCLUDE (FirstName, Surname)

CREATE INDEX idx_Customers_RegistrationDate
ON dbo.Customers (RegistrationDate)

SELECT CustomerID, FirstName, Surname
FROM dbo.Customers
WHERE IsActive = 1 OR RegistrationDate = '2010-01-24'</pre>
<p>Now that looks like a perfectly reasonable query. There are two indexes, one for each predicate combined with OR. If I reduce the SELECT to just CustomerID SQL does indeed do two seeks and a merge join (concatenation) as seen in earlier execution plans. Add the FirstName and Surname back into the query and SQL switches to a clustered index scan.</p>
<p>Can I force this to seek. Well, yes (SQL 2008 only)</p>
<pre class="brush: sql; title: ; notranslate">SELECT CustomerID, FirstName, Surname
FROM dbo.Customers
WITH (FORCESEEK)
WHERE IsActive = 1 OR RegistrationDate = '2010-01-24'</pre>
<p>However this does not produce the plan that you might expect. Looking at how SQL processed the earlier queries, one might assume that SQL would seek both indexes, do a key lookup only on the rows returned from the index that&#8217;s not covering, then concatenate the two resultsets. Fair assumption, but that&#8217;s not what SQL does.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2011/04/NotWhatIWasExpecting.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;" title="NotWhatIWasExpecting" src="http://sqlinthewild.co.za/wp-content/uploads/2011/04/NotWhatIWasExpecting_thumb.png" border="0" alt="NotWhatIWasExpecting" width="484" height="154" /></a></p>
<p>Instead it seeks on both indexes, concatenates the resultsets, then does the key lookup on what is essentially half of the table. Not efficient at all.</p>
<p>The optimiser appears not to be considering the possibility that it could seek both, do a key lookup only on the rows returned from the predicate on RegistrationDate (as the other is seeking on a covering index and hence already has the columns needed) and then concatenate the two. No wonder it picks a cluster scan by preference.</p>
<p>So what happens if it&#8217;s not practical to make both indexes covering? That&#8217;s going to have to be a topic for another day.For now I hope this has cleared up a bit on indexing for queries using OR.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/05/03/indexing-for-ors/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Is a clustered index best for range queries?</title>
		<link>http://sqlinthewild.co.za/index.php/2011/02/01/is-a-clustered-index-best-for-range-queries/</link>
		<comments>http://sqlinthewild.co.za/index.php/2011/02/01/is-a-clustered-index-best-for-range-queries/#comments</comments>
		<pubDate>Tue, 01 Feb 2011 14:30:00 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Indexes]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=822</guid>
		<description><![CDATA[I see a lot of advice that talks about the clustered index been the best index for use for range queries, that is queries with inequalities filters, queries that retrieve ranges of rows, as opposed to singleton queries, queries that retrieve single rows (including, unfortunately, a Technet article). I suspect the reasoning behind this advice [...]]]></description>
			<content:encoded><![CDATA[<p>I see a lot of advice that talks about the clustered index been the best index for use for range queries, that is queries with inequalities filters, queries that retrieve ranges of rows, as opposed to singleton queries, queries that retrieve single rows (including, unfortunately, a <a href="http://technet.microsoft.com/en-us/library/ms190639.aspx">Technet article</a>).</p>
<p>I suspect the reasoning behind this advice is the idea that the clustered index stores the data in order of the clustering key (ack) and hence it&#8217;s &#8216;logical&#8217; that such a structure would be best for range scans as SQL can simply start at the beginning of the range and read sequentially to the end.</p>
<p>Question is, is that really the case?</p>
<p>Let&#8217;s do some experiments and find out.</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE TestingRangeQueries (
ID INT IDENTITY,
SomeValue NUMERIC(7,2),
Filler CHAR(500) DEFAULT ''
)

-- 1 million rows
INSERT INTO TestingRangeQueries (SomeValue)
SELECT TOP (1000000) RAND(CAST(a.object_id AS BIGINT) + b.column_id*2511)
FROM msdb.sys.columns a CROSS JOIN msdb.sys.columns b

-- One cluster and two nonclustered indexes on the column that will be used for the range filter

CREATE CLUSTERED INDEX idx_RangeQueries_Cluster
ON TestingRangeQueries (ID)

CREATE NONCLUSTERED INDEX idx_RangeQueries_NC1
ON TestingRangeQueries (ID)

CREATE NONCLUSTERED INDEX idx_RangeQueries_NC2
ON TestingRangeQueries (ID)
INCLUDE (SomeValue)
GO</pre>
<p>The query that I&#8217;ll be testing with will do a sum of the SomeValue column for a large range of ID values. That means that of the three indexes that I&#8217;m testing, one is clustered, one is a nonclustered that does not cover the query and the third is a covering nonclustered index.</p>
<pre class="brush: sql; title: ; notranslate">SELECT SUM(SomeValue)
FROM TestingRangeQueries
WHERE ID BETWEEN 20000 and 200000 -- 180 001 rows, 18% of the table</pre>
<p>I&#8217;m going to run the same range scan query three times, each with an index hint so that SQL will use the three different indexes, regardless of which one it thinks is best.</p>
<p>First up, the clustered index.</p>
<p>As expected, we get a clustered index seek (the predicate is SARGable) and a stream aggregate.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/12/ClusteredIndex.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="ClusteredIndex" src="http://sqlinthewild.co.za/wp-content/uploads/2010/12/ClusteredIndex_thumb.png" border="0" alt="ClusteredIndex" width="484" height="86" /></a></p>
<blockquote><p>Table &#8216;TestingRangeQueries&#8217;. Scan count 1, logical reads 12023, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 94 ms,  elapsed time = 110 ms.</p></blockquote>
<p><span id="more-822"></span></p>
<p>So if the advice is correct, this should be the best (lowest CPU, lowest IO). Let&#8217;s see…</p>
<p>The first nonclustered index does not cover the query. Hence, seeing as this query returns a substantial portion of the table, we could assume that the optimiser probably <a href="http://sqlinthewild.co.za/index.php/2009/01/09/seek-or-scan/">wouldn&#8217;t chose to use it</a> because of the cost of the key lookups. If that is the case, then if the query probably won&#8217;t be very efficient if I force the use of that index.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/12/NonclusteredIndex1.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="NonclusteredIndex1" src="http://sqlinthewild.co.za/wp-content/uploads/2010/12/NonclusteredIndex1_thumb.png" border="0" alt="NonclusteredIndex1" width="484" height="115" /></a></p>
<blockquote><p>Table &#8216;TestingRangeQueries&#8217;. Scan count 1, logical reads 551413, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 562 ms,  elapsed time = 560 ms.</p></blockquote>
<p>Ow. Not very efficient at all. Those key lookups hurt.</p>
<p>One last index to test, the covering non-clustered index.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/12/NonclusteredIndex2.png"><img style="background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border-width: 0px;" title="NonclusteredIndex2" src="http://sqlinthewild.co.za/wp-content/uploads/2010/12/NonclusteredIndex2_thumb.png" border="0" alt="NonclusteredIndex2" width="484" height="95" /></a></p>
<blockquote><p>Table &#8216;TestingRangeQueries&#8217;. Scan count 1, logical reads 338, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 62 ms,  elapsed time = 67 ms.</p></blockquote>
<p>2/3 the CPU usage of the query using the clustered index and about 3% of the reads. No question about it, this one&#8217;s faster and less resource intensive. That pretty much invalidates the claim that the clustered index is best for range queries.</p>
<p>So what&#8217;s going on here?</p>
<p>The technet article I linked to at the beginning of this post states the following as reasoning for recommending a clustered index for range queries:</p>
<blockquote><p>After the row with the first value is found by using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent.</p></blockquote>
<p>Um, well, ignoring that there&#8217;s no guarantee of physical adjacency with an index at all, how does this differ from a nonclustered index?</p>
<p>In a clustered index, the leaf pages are logically ordered by the clustered index key (meaning that SQL can follow a page&#8217;s next page pointer to get the next page in the key order). To do a range query using the clustered index, SQL will seek down the b-tree to the start of the range and then read along the leaf pages, following the next page pointers, until it reaches the end of the range.</p>
<p>In a nonclustered index, the leaf pages are logically ordered by the index key (just the same as in a cluster). To do a range query using the nonclustered index, SQL will seek down the b-tree to the start of the range and then read along the leaf pages, following the next page pointers, until it reaches the end of the range. If additional columns are needed, SQL will then do a key/RID lookup for each row to retrieve the additional rows.</p>
<p>Not much difference there, other than the key lookups. So &#8216;physical adjacency&#8217; is pretty much ruled out as a reason using the clustered index  (if it was even true)</p>
<p>What is important, as we saw from the two tests of the nonclustered index, is that when the query is retrieving a significant portion of the table (and by &#8216;significant&#8217; I mean more than about 1%), the index needs to be covering, or the cost of the key lookups becomes overwhelming. Hence, what we want for a range query is a covering index.</p>
<p>The clustered index is always covering, because it contains, at the leaf level, all the columns of the table. It is, however, the largest index on a table<sup>1</sup>. The larger the index, the less efficient that index becomes. Hence while the clustered index is good for a range query, it&#8217;s not the best possible index for a range query.</p>
<p>The best possible index for a range query is the smallest index that is seekable and covers the query (the same as for just about any other query).</p>
<p>Now it&#8217;s not always possible to cover a query, and some queries shouldn&#8217;t be covered. There will be times when the cluster is the best choice for range queries, either because of the number of columns required or because just about every query filters on a particular column and that column is a <a href="http://www.sqlservercentral.com/articles/Indexing/68563/">good choice for the cluster</a>. Just don&#8217;t make the mistake of thinking it&#8217;s the only choice.</p>
<p>(1) It is possible to have a nonclustered index that&#8217;s larger than the clustered index. Takes some work though and is far from a usual case.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2011/02/01/is-a-clustered-index-best-for-range-queries/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>One wide index or multiple narrow indexes?</title>
		<link>http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/#comments</comments>
		<pubDate>Tue, 14 Sep 2010 14:00:05 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Indexes]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=654</guid>
		<description><![CDATA[Or &#8220;If one index is good, surely many indexes (indexes? indices? indi?) will be better&#8221; This is a question that comes up very often on the forums. Something along the lines of: I have a query with multiple where clause conditions on a table. Should I create one index for each condition, or one index [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://michaeljswart.com/?p=844"><img style="margin: 0px 0px 0px 5px; display: inline; border: 0px;" title="TSQL2sDay150x150" src="http://sqlinthewild.co.za/wp-content/uploads/2010/09/TSQL2sDay150x150.jpg" border="0" alt="TSQL2sDay150x150" width="154" height="154" align="right" /></a> Or &#8220;<em>If one index is good, surely many indexes (indexes? indices? indi?) will be better</em>&#8221;</p>
<p>This is a question that comes up very often on the forums. Something along the lines of:</p>
<blockquote><p>I have a query with multiple where clause conditions on a table. Should I create one index for each condition, or one index with all the columns in it?</p></blockquote>
<p>The question basically boils down to this: Which is more optimal and more likely for the optimiser to pick, a single seek operation against a wide index that seeks on all three conditions in one go, or three seek operations against three indexes followed by a join to get back the final set of rows.</p>
<p>One thing to keep in mind is that one of the jobs of an index is to reduce the number of rows in consideration for a query as early as possible in the query&#8217;s execution.</p>
<p>So let&#8217;s take a made-up example. Let&#8217;s say we have a table with a number of columns in it. A query is run against that table with three conditions in the where clause</p>
<pre class="brush: sql; title: ; notranslate">WHERE ColA = @A AND ColB = @B AND ColC = @C</pre>
<p>Let&#8217;s further say that 1000 rows qualify for the condition ColA = @A, 15000 rows qualify for ColB = @B and 30000 rows qualify for ColC = @C. The total number of rows that qualify for all three conditions is 25.</p>
<p>Which sounds like it would be more efficient?</p>
<ul>
<li>Seek on an index with all three columns and retrieve just 25 rows</li>
<li>Seek on an index on ColA, retrieve 1000 rows, seek on an index on ColB, retrieve 15000 rows, seek on an index on ColC, retrieve 30000 rows then join the three result-sets together to get the desired 25 rows (called an Index Intersection)</li>
</ul>
<p>Time for some tests to find out.</p>
<p><span id="more-654"></span></p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE TestingIndexUsage (
id INT IDENTITY PRIMARY KEY,
FilterColumn1 INT,
FilterColumn2 INT,
FilterColumn3 INT,
Filler CHAR(500) DEFAULT ''-- simulate other columns in the table.
)
GO

INSERT INTO TestingIndexUsage (FilterColumn1, FilterColumn2, FilterColumn3)
SELECT TOP ( 1000000 )
ABS(CHECKSUM(NEWID()))%200,
ABS(CHECKSUM(NEWID()))%40,
ABS(CHECKSUM(NEWID()))%20
FROM msdb.sys.columns a CROSS JOIN msdb.sys.columns b
GO</pre>
<p>First off, I&#8217;m going to create three individual indexes on the three filter columns and see what kind of plan SQL comes up with.</p>
<pre class="brush: sql; title: ; notranslate">CREATE INDEX idx_Temp_FilterColumn1 ON dbo.TestingIndexUsage (FilterColumn1)
CREATE INDEX idx_Temp_FilterColumn2 ON dbo.TestingIndexUsage (FilterColumn2)
CREATE INDEX idx_Temp_FilterColumn3 ON dbo.TestingIndexUsage (FilterColumn3)</pre>
<p>And the query…</p>
<pre class="brush: sql; title: ; notranslate">SELECT ID FROM dbo.TestingIndexUsage
WHERE FilterColumn1 = 68 -- 4993 matching rows
AND FilterColumn2 = 26 -- 24818 matching rows
AND FilterColumn3 = 3  -- 49915 matching rows</pre>
<p>The comments show how many rows each predicate returns alone. Combined they return 19 rows.</p>
<p>The plan shows the semi-expected index intersection. Seeks on 3 indexes, two merge join operators to join the three resultsets into one.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/09/IndexIntersection.png"><img style="display: inline; border: 0px;" title="IndexIntersection" src="http://sqlinthewild.co.za/wp-content/uploads/2010/09/IndexIntersection_thumb.png" border="0" alt="IndexIntersection" width="480" height="182" /></a></p>
<p>But what about the performance characteristics?</p>
<blockquote><p>Table &#8216;TestingIndexUsage&#8217;. Scan count 3, logical reads 150, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 422 ms,  elapsed time = 435 ms.</p></blockquote>
<p>The reads aren&#8217;t very high (as the indexes are extremely narrow), but that CPU time is not exactly low. Almost half a second on the CPU to return 19 rows from a 1 million row table? Not good, especially if this is going to run often.</p>
<p>Right, so that&#8217;s the three separate indexes. What about the case of a single index with all three columns. In this case, because all three are SARGable equality predicates, <a href="http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/">the order of the columns isn&#8217;t critical for index usage</a>, so I&#8217;ll put them in order of selectivity.</p>
<pre class="brush: sql; title: ; notranslate">DROP INDEX idx_Temp_FilterColumn1 ON dbo.TestingIndexUsage
DROP INDEX idx_Temp_FilterColumn2 ON dbo.TestingIndexUsage
DROP INDEX idx_Temp_FilterColumn3 ON dbo.TestingIndexUsage

CREATE INDEX idx_Temp_FilterColumn123 ON dbo.TestingIndexUsage (FilterColumn1, FilterColumn2, FilterColumn3)</pre>
<p>And run the query again.</p>
<p>As kinda expected, the execution plan has a single index seek operation. Exec plan looks cleaner, what do the performance characteristics say?</p>
<blockquote><p>Table &#8216;TestingIndexUsage&#8217;. Scan count 1, logical reads 3, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 0 ms,  elapsed time = 0 ms.</p></blockquote>
<p>Just about says it all. 147 fewer reads and a 100% reduction in CPU cost. The reduction in reads isn&#8217;t going to make a major difference, the reads were low anyway, but the reduction in CPU cost is going to make an impact if this query is frequently run.</p>
<p>So what can we conclude from this?</p>
<p>The optimal index for a query with multiple conditions in the where clause is a single index with all the columns that are used in the where clause in it. The order of these columns may matter, depending on how they are used in the where clause (see <a href="http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/">Equality predicates</a> and <a href="http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/">Inequality predicates</a>)</p>
<p>SQL can use multiple indexes on a single table (Index Intersection), but it&#8217;s not the most efficient option. It&#8217;s worth nothing that SQL won&#8217;t always chose to do the index intersection. It may quite well decide that a table/clustered index scan is faster than the multiple seeks and joins that the intersection will do. Or, if one of the conditions is very selective, it may decide to seek on one of the indexes, do key lookups to fetch the rest of the columns and then do secondary filters to evaluate the rest of the predicates.</p>
<p>Now it may not always be possible to create a perfect index for all queries on a table, so in some cases, especially for less important queries, having multiple indexes that SQL can seek and intersect may be adequate, but for the more critical, more frequently run queries you probably want a single index with the appropriate columns.</p>
<p>As an aside, this is why the often-mentioned index &#8216;strategy&#8217; of a single column index on each column of a table is near-useless and certainly not worth the title &#8216;strategy&#8217;.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Is a scan a bad thing?</title>
		<link>http://sqlinthewild.co.za/index.php/2009/07/29/is-a-scan-a-bad-thing/</link>
		<comments>http://sqlinthewild.co.za/index.php/2009/07/29/is-a-scan-a-bad-thing/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 07:24:36 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Indexes]]></category>
		<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=170</guid>
		<description><![CDATA[This one comes up from time to time, so I thought I&#8217;d have a go at addressing it. Let&#8217;s imagine a hypothetical DBA who&#8217;s doing some performance tuning. He looks at a query plan for a moderately complex query and panics because there&#8217;s a couple of index scans and he wants to rather see index [...]]]></description>
			<content:encoded><![CDATA[<p>This one comes up from time to time, so I thought I&#8217;d have a go at addressing it.</p>
<p>Let&#8217;s imagine a hypothetical DBA who&#8217;s doing some performance tuning. He looks at a query plan for a moderately complex query and panics because there&#8217;s a couple of index scans and he wants to rather see index seeks.</p>
<p>Is that correct, are index scans bad and index seeks good?</p>
<p>Well, most of the time yes. Most of the time a scan is a problem and indicates a missing index or a query problem, but there are other times that it&#8217;s the most optimal way to get the required rows from the table.</p>
<p>I&#8217;ve previously looked at the <a href="http://sqlinthewild.co.za/index.php/2009/03/05/when-is-a-seek-actually-a-scan/">case where the index seeks actually reads the the entire table</a>, in this post I&#8217;m going to be evaluating some common query constructs to see when a seek really is the most optimal operator.</p>
<p>Let&#8217;s start with the simplest case, and I&#8217;m going to use the AdventureWorks database for these queries.</p>
<p>select ProductID, Name from Production.Product</p>
<p>In this case I get an index scan on the AK_Product_Name index and that makes perfect sense. I&#8217;m asking for all the rows in the table. there is no way that SQL can use a seek to execute that query. For there to be a seek, there has to be a SARGable predicate within the query that can be used for the seek.</p>
<p><span id="more-170"></span>Now, how about this one</p>
<p>SELECT ProductID, Name from Production.Product WHERE UPPER(Name) LIKE &#8216;MOUNTAIN%&#8217;</p>
<p>There&#8217;s a predicate and it&#8217;s on a column that&#8217;s indexed, but we still get an index scan. The scan is there because that predicate is not a SARGable one because there is a function on the column. Since the DB in question is not case sensitive, there&#8217;s no need for that particular function and, if it&#8217;s removed, we get an index seek. So in this case, the scan is not optimal and we can convert it into a much more efficient index seek with a small modification.</p>
<p>So far, so good.</p>
<p>Let&#8217;s try  adding another column to this select.</p>
<p>SELECT ProductID, Name, ProductNumber from Production.Product WHERE Name LIKE &#8216;MOUNTAIN%&#8217;</p>
<p>Again I have a scan, and not just an index scan, I have a clustered index scan. A read of the entire table. Why? The predicate is SARGable and there is an index with that as the leading column.</p>
<p>The index on ProductNumber is not covering for this query (it doesn&#8217;t have the columns ProductNumber in it). SQL has decided that, based on the number of rows returned, it&#8217;s better for it to scan the cluster than to seek on the noncluster and do a large number of lookups.</p>
<p>In this particular case, the scan of the cluster is better than the seeks and bookmark lookups. When tested with an index hint, the scan of the cluster did 15 logical reads and the seek with book mark lookup did 79. So, without widening the index, here the scan is the optimal way to run this query.</p>
<p>One more&#8230;</p>
<p>select p.Name, Sum(sod.LineTotal) AS TotalPerProduct<br />
from Sales.SalesOrderDetail sod<br />
inner join Sales.SalesOrderHeader soh on sod.SalesOrderID = soh.SalesOrderID<br />
inner join Production.Product p on sod.ProductID = p.ProductID<br />
where SalesPersonID = 277<br />
Group by p.name</p>
<p>Here I&#8217;ve got two seeks, on the sales order tables, and a full index scan on the product table, specifically of the index on product name.</p>
<p>The scan is there because of the number of seeks that would otherwise be required to execute this. A seek can only return a single value or a range of values. If a seek is required to evalute a join, that seek has to run once for each row in the other table. This is the classic nested loop join. In this case, there are 246 rows in the resultset that the products table needs to be joined to. That means, if SQL evaluated this with seeks, it would have to do 246 seeks. There are only 504 rows in total in the products table.</p>
<p>To test this, I can use the ForceSeek hint (SQL 2008 only) to force a seek on the clustered index (the one on the join column) and compare the IOs</p>
<p>Scan of Products:<br />
Table &#8216;Product&#8217;. Scan count 1, logical reads 5, physical reads 0</p>
<p>Forced seek of Products:<br />
Table &#8216;Product&#8217;. Scan count 0, logical reads 492, physical reads 0</p>
<p>That&#8217;s a major difference in the number of IOs.</p>
<p>So, in conclusion&#8230;. Scans are not the ideal query operators, usually are not optimal and can indicate missing indexes or poorly written queries. However there are times that scanning an index or even a table is the most optimal way of processing a query, so if there&#8217;s a query that has an index/table scan in it, maybe spend a few minutes understanding why the scan&#8217;s there in the first place, before spending time trying to get rid of it.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2009/07/29/is-a-scan-a-bad-thing/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>When is a seek actually a scan?</title>
		<link>http://sqlinthewild.co.za/index.php/2009/03/05/when-is-a-seek-actually-a-scan/</link>
		<comments>http://sqlinthewild.co.za/index.php/2009/03/05/when-is-a-seek-actually-a-scan/#comments</comments>
		<pubDate>Thu, 05 Mar 2009 14:09:27 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Indexes]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=173</guid>
		<description><![CDATA[Most people who know SQL execution plans will say, without reservation, that an index seek on a particular index is better than an index scan on the same index. In the vast majority of cases, that&#8217;s true, but there are times when what appears in the execution plan as an index seek is actually an [...]]]></description>
			<content:encoded><![CDATA[<p>Most people who know SQL execution plans will say, without reservation, that an index seek on a particular index is better than an index scan on the same index. In the vast majority of cases, that&#8217;s true, but there are times when what appears in the execution plan as an index seek is actually an index scan.</p>
<p>Let me show an example</p>
<p>CREATE TABLE TestingSeeks (<br />
id int identity (1,1) not null,<br />
SomeStr char(6) default &#8221; &#8212; a filler<br />
)<br />
GO</p>
<p>insert into TestingSeeks (SomeStr)<br />
select top (500000) &#8221;<br />
from sys.columns c1 cross join sys.columns c2</p>
<p>We have a table here with an identity column on it, starting at 1 and incrementing by 1 row. Hence, there will not be negative values in the table. I&#8217;m going to then put a nonclustered index on that column (the table has no cluster, it&#8217;s a heap)</p>
<p>CREATE NONCLUSTERED INDEX idx_Seek ON TestingSeeks (id)</p>
<p>Fair enough. If I query all the rows in the table and retrieve just the ID column, I&#8217;ll get a scan on that index, as is pretty much expected and Statistics IO tells me that 935 pages were read</p>
<p><span id="more-173"></span></p>
<p><img class="alignnone size-full wp-image-218" style="border: 1px solid black;" title="scan" src="http://sqlinthewild.co.za/wp-content/uploads/2009/03/scan.png" alt="" width="412" height="126" /></p>
<p>So a read of the entire index is 935 pages. Now, let me add a filter.</p>
<p>select id from TestingSeeks</p>
<p>where id&gt;0</p>
<p>That predicate is SARGable and there is an appropriate index. Sure enough, we get an index seek here.</p>
<p><img class="alignnone size-full wp-image-219" style="border: 1px solid black;" title="seek" src="http://sqlinthewild.co.za/wp-content/uploads/2009/03/seek.png" alt="" width="416" height="125" /></p>
<p>That&#8217;s good. Isn&#8217;t it?</p>
<p>Well, not really. That filter&#8217;s going to match all the rows in the table. We know there are none with an id less than 1. Statistics IO tells me that 935 pages were read, exactly the same as for the scan. It&#8217;s a seek operation, but it&#8217;s done exactly the same work as the scan did.</p>
<p>Moral of the story: A seek doesn&#8217;t always read only a portion of the index, a seek on an index is not necessarily doing less work than a scan on the same index and silly tricks intended to force an index seek are not going to make a query run faster.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2009/03/05/when-is-a-seek-actually-a-scan/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>All indexes are unique</title>
		<link>http://sqlinthewild.co.za/index.php/2009/02/09/all-indexes-are-unique/</link>
		<comments>http://sqlinthewild.co.za/index.php/2009/02/09/all-indexes-are-unique/#comments</comments>
		<pubDate>Mon, 09 Feb 2009 16:50:48 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Indexes]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=209</guid>
		<description><![CDATA[Well, that&#8217;s a rather contentious title. There are probably several people shaking their heads at this point. Let me explain. I was listening to a podcast with Kimberly Tripp this morning, and she mentioned this briefly. I thought it would be a good discussion to end a short series on indexes and selectivity. The Clustered [...]]]></description>
			<content:encoded><![CDATA[<p>Well, that&#8217;s a rather contentious title. There are probably several people shaking their heads at this point. Let me explain.</p>
<p>I was listening to a <a href="http://runasradio.com/default.aspx?showNum=76">podcast with Kimberly Tripp</a> this morning, and she mentioned this briefly. I thought it would be a good discussion to end a short series on indexes and selectivity.</p>
<p><strong>The Clustered Index</strong></p>
<p>A clustered index has to be unique, because the clustering key acts as the row&#8217;s location in the table. If the index is not defined as unique, SQL will make it unique by adding a uniquifier, a 4-byte integer that&#8217;s hidden behind the scenes and is added when necessary to make the clustered index unique.</p>
<p>It&#8217;s not documented anywhere clearly, but it is mentioned in a couple of places. From <a href="http://msdn.microsoft.com/en-us/library/ms177484.aspx">msdn</a>:</p>
<blockquote><p>If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a <strong>uniqueifier</strong>. This four-byte value is not visible to users. It is only added when required to make the clustered key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.</p></blockquote>
<p>So all clustered indexes are unique.</p>
<p><span id="more-209"></span><strong>The Nonclustered Index</strong></p>
<p>A nonclustered index contains, in addition to the index key and any include columns, a pointer to the actual row. This is so that the row can be retrieved when other columns are needed for a query (A <a href="http://sqlinthewild.co.za/index.php/2009/01/27/a-bookmark-lookup-by-any-other-name/">bookmark lookup</a>)</p>
<p>When the table has a clustered index, this pointer is the clustered index key. When the table does not have a clustered index, the pointer is the RID, a combination of file ID, page ID and slot index (which gives the row&#8217;s logical position on the page). These pointers are not just stored at the leaf level of the index, they&#8217;re stored at the higher levels as well, something that a bit of poking with DBCC Page can easily verify. (Unless the nonclustered index is defined unique, in which case it&#8217;s just at the lead level)</p>
<p>As was proven above, the clustering key is unique. The RID, since it points to the row&#8217;s actual position, is also unique. There&#8217;s no way that two rows can be in the same place on a page.</p>
<p>Hence, since part of the nonclustered index is unique, the entire index has to be unique.</p>
<p>So all nonclustered indexes are also unique.</p>
<p>Q.E.D.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2009/02/09/all-indexes-are-unique/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Index columns, selectivity and inequality predicates</title>
		<link>http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/</link>
		<comments>http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/#comments</comments>
		<pubDate>Thu, 05 Feb 2009 22:41:42 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Indexes]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=203</guid>
		<description><![CDATA[So, following on from my post last week, I&#8217;m going to take a look at how selectivity and index column order affect inequality predicates. One thing to note straight off is that the selectivity of a column is much less important for inequality predicates than it was for equality. For equality predicates, the selectivity alone [...]]]></description>
			<content:encoded><![CDATA[<p>So, following on from my <a href="http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/">post last week</a>, I&#8217;m going to take a look at how selectivity and index column order affect inequality predicates.</p>
<p>One thing to note straight off is that the selectivity of a column is much less important for inequality predicates than it was for equality. For equality predicates, the selectivity alone can give a reasonable idea of the number of rows a particular predicate will return. That&#8217;s not the case with inequalities. Also, with inequality predicates, the order of columns in the index becomes very important.</p>
<p>One of the most important considerations with inequality predicates is the number of rows that the predicate will return. An identity column may be highly selective, but if the filter is for all rows &gt; 0 and the identity values start t one, then an index on that column is not going to be very useful.</p>
<p>The other consideration when there are inequality predicates is that only that column and columns to the left of it in the index key can be used for index seeks. Any columns to the right of the column with the inequality is no longer eligible for seeking.</p>
<p>To explain with an example, consider our hypothetical table from the previous post (with one small change):</p>
<pre class="brush: sql; title: ; notranslate">CREATE TABLE ConsideringIndexOrder (
ID INT,
SomeString VARCHAR (100),
SomeDate DATETIME DEFAULT GETDATE()
);  </pre>
<p>The same as previously, there&#8217;s a single nonclustered index on all three columns, in the order ID, SomeDate, SomeString.</p>
<p>If there&#8217;s an inequality predicate, then then the index is only fully seekable for the following queries<br />
…  WHERE ID = @ID AND SomeDate = @dt AND SomeString &gt; @str<br />
…  WHERE ID = @ID AND SomeDate &gt; @dt<br />
…  WHERE ID &gt; @ID</p>
<p><span id="more-203"></span>If there&#8217;s another predicate, equality or inequality, on a column further to the right in the index, that cannot be executed as part of the index seek, and will be done as a second step, just as happened with equalities when the predicates were not left-based subsets of the index columns.</p>
<p>So, what does that mean for index columns order? Quite simply, if queries are always going to filter with one or more equality predicates and one or more inequality predicates, the columns used for the inequalities must appear further to the right in the index than the equalities.</p>
<p>That&#8217;s great when there&#8217;s only one inequality predicate, but what happens when there&#8217;s more than one? If there are going to be more than one inequality predicate, the one that is likely to return fewer rows should go earlier in the index. This is not to say the most selective one, but the one that will be queried with a more selective range.</p>
<p>Using the above table as an example, if a typical query will run with an inequality on the ID column that on average will return 1000 rows and with an inequality on the date column that will on average return 100 rows, then the date column should go before the ID in the index (assuming that&#8217;s the only query)</p>
<p>Let’s take a look at some query scenarios based on the hypothetical table above to see how that index will be used with some inequality predicates.</p>
<p><strong>Scenario 1: Inequality predicate on the ID column</strong></p>
<p>This is probably the simplest of the inequalities. Since ID is the leading column of the index, SQL does a seek to find the beginning of the range (or the first row in the table if applicable) and then reads along the leaf pages of the index until it reaches the end of the range. Those rows are then returned.</p>
<p><img class="alignnone size-full wp-image-206" style="border: 1px solid black;" title="seek1" src="http://sqlinthewild.co.za/wp-content/uploads/2009/02/seek1.png" alt="" width="309" height="174" /></p>
<p><strong>Scenario 2: Equality match on the ID column and inequality on the Date column</strong></p>
<p>This one&#8217;s also fairly easy. SQL seeks to find a matching ID and the start of the range and then reads along hte index to find the rest of the rows.</p>
<p><img class="alignnone size-full wp-image-207" style="border: 1px solid black;" title="seek2" src="http://sqlinthewild.co.za/wp-content/uploads/2009/02/seek2.png" alt="" width="308" height="197" /></p>
<p><strong>Scenario 3: Inequality match on both the ID and Date columns</strong></p>
<p>In this case, only one of the predicates can be used as a seek predicate, the other will be executed as a predicate, meaning that each row that the seek retrieves has to be compared against that predicate. Since the index starts with ID, it&#8217;s the inequality on ID that will be picked for the seek. If there was a second index that started with date, that one might be picked instead.</p>
<p><img class="alignnone size-full wp-image-208" style="border: 1px solid black;" title="seek-3" src="http://sqlinthewild.co.za/wp-content/uploads/2009/02/seek-3.png" alt="" width="309" height="296" /></p>
<p>While both columns are mentioned in the seek predicate, note that there&#8217;s also a predicate on the SomeDate column, which is not present in the simple index seeks.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

