Seek or scan?

One very common question that I see on the forums is on index seeks and index scans. A query is resulting in a table/clustered index scan, even though there’s an index on one or more of the columns been searched on.

One of the more common reasons that this happens is because the index in question is not covering, and SQL has determined that the cost of doing the lookups to fetch the extra columns is higher than the cost of scanning the entire table.

If an index does not cover a query, then bookmark lookups are required to get the additional columns, bookmark lookups are run one row at a time, and are seeks on the clustered index. Hence it’s clear that bookmark lookups on a large number of rows are exceedingly expensive and that is why SQL will switch to a clustered index/table scan when lookups are required on a significant percentage of the rows in the table.

So, what constitutes a significant percentage of the rows in the table? 50%? 20%? 10%?

The answer’s often surprising. It’s usually under 1% of the total rows in the table.

Here’s some test code to demonstrate that.

Setup code:

CREATE TABLE dbo.SeekOrScan(
 ID int IDENTITY(1,1) NOT NULL primary key,
 SomeNumber int,
 padding char(100)
)

CREATE NONCLUSTERED INDEX idx_SeekOrScan ON dbo.SeekOrScan (SomeNumber)

Test code:

insert into SeekOrScan (SomeNumber)
select top 1000000 0
from master..spt_values a cross join master..spt_values b
where a.name is null and b.name is null

update SeekOrScan set SomeNumber = ID -- just so we've got sequential numbers for the between
GO

dbcc freeproccache

select * from SeekOrScan
where somenumber between 1 and 100000 -- 10% of table
-- Clustered index scan

dbcc freeproccache

select * from SeekOrScan
where somenumber between 1 and 50000 -- 5% of table
-- Clustered index scan

dbcc freeproccache

select * from [SeekOrScan]
where somenumber between 1 and 10000 -- 1%
-- Clustered index scan

dbcc freeproccache

select * from [SeekOrScan]
where somenumber between 1 and 5000  -- 0.5% of table
-- clustered index scan

dbcc freeproccache

select * from [SeekOrScan]
where somenumber between 1 and 3000  -- 0.3% of table
-- nonclustered index seek

So somewhere around 0.4% of the table, the index seek with key lookup became more expensive than a table scan. The result is specific to this table and will differ for tables of different row size because the point at which a scan of the clustered index becomes cheaper than multiple key lookups depends on the number of pages in the table.

14 Comments

Jack D Corbett 9 January 2009 at 00:09

Another reason to make sure you create covering indexes

Reply ↓
Pingback: Links for the week of 2009-01-09 | facility9
Gail 10 January 2009 at 11:01

Jack: True, though, while the INCLUDE means that all queries can be covered, IMHO not all queries should be covered. Sometimes the size or number of indexes required to cover *everything* means it’s just not worth it.

Reply ↓
Christopher Stobbs 23 January 2009 at 17:36

I tested the above code and noticed that I either get Clustered index scans or a an index seek joined onto a Clustered index seek. I didn’t see any bookmarks or key lookups? Have I missed something or was the intention to show that a scan can be better than two seeks?

Reply ↓
Gail 23 January 2009 at 18:26

Let me guess, SQL 2005 RTM or SP1?

The operator was called a bookmark lookup in SQL 2000 and before. In SQL 2005 SP2 and later it was a key lookup.

In SQL 2005 RTM and SP1 it appeared as a clustered index seek, and hence often went unnoticed (because clustered index seeks are good)

Reply ↓
Christopher Stobbs 26 January 2009 at 11:10

he he he Yip SP1, I’ve tried it on another machine and it all seems good 🙂 thanks for that

Reply ↓
Pingback: Now syndicating Gail Shaw and Kendal Van Dyke | Brent Ozar - SQL Server DBA
Cedric 18 March 2009 at 14:14

Even creating a covering index in this instance will still yield a clustered index scan if the columns on which the covering index are defined is not part explicitly stated in the SELECT clause ,until the point where the ratio of rows like Gail illustrated above is reached.

If we changed one of the statements above to :

select somenumber from [SeekOrScan]
where somenumber between 1 and 50000

Then we see a non clustered index seek

Reply ↓
Gail 18 March 2009 at 16:41

Cedric, are you saying that if I create an index (SomeNumber) INCLUDE (id, padding) and then run “select * from [SeekOrScan] where somenumber between 1 and 50000” that I’ll still get a clustered index scan?

Sure, if you just specify the indexes column in the select it will be an index seek, because no bookmark lookup is required, the index alone can satisfy the query. It’s only when there’s a bookmark lookup that there’s a ‘tipping point’

Reply ↓
Cedric 16 April 2009 at 01:56

Sorry I should have specifically mentioned that I was refering to the index defined as : CREATE NonCLUSTERED INDEX [idx_SeekOrScan] ON [dbo].[SeekOrScan] (SomeNumber) my tiping point seems to different than yours.I need more information in this regard.I will do some more investigation .

Agreed that if you add the INCLUDE (id, padding) that a non clustered index seek will be used.

I should have maybe worded my post differently . If I include the SomeNumber field explicitly in my select list as the only field then the non clustered index seems to be covering even if I add the ID field it still used the non clustered index seek ,but as soon as I add the padding field the query plan swings around to a clustered index scan

Does all this mean that depeinding on your select list that the tipping point can potentially be different if you explicitly state your field list ?

Reply ↓
Gail 16 April 2009 at 10:10

No. It’s because the ID column is part of that nonclustered index. ID is the clustered index key and hence is part of all nonclustered indexes because it’s the row’s address. (see – https://www.sqlinthewild.co.za/index.php/2009/02/09/all-indexes-are-unique/)

So your query with ID and SomeNumber was reading a covering nonclustered index.

Reply ↓
Jason 16 June 2010 at 23:49

Any thoughts on this related post on Stack Overflow? Mike M. referenced this article, but at this time of writing, we’re still a little confused …

http://stackoverflow.com/questions/3056321/sql-2005-indexed-queries-slower-than-unindexed-queries

Reply ↓
NInja Rg'Rus 15 August 2011 at 18:52

A very interesting post on the same subject by Kimberly Tripp.

http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Tipping-Point-Query-Answers.aspx

Reply ↓
Pingback: Performace Tuning Stored Procedures « SQL Canada

Seek or scan?

Share this:

14 Comments

Leave a Comment Cancel reply