This one comes up from time to time, so I thought I’d have a go at addressing it.
Let’s imagine a hypothetical DBA who’s doing some performance tuning. He looks at a query plan for a moderately complex query and panics because there’s a couple of index scans and he wants to rather see index seeks.
Is that correct, are index scans bad and index seeks good?
Well, most of the time yes. Most of the time a scan is a problem and indicates a missing index or a query problem, but there are other times that it’s the most optimal way to get the required rows from the table.
I’ve previously looked at the case where the index seeks actually reads the the entire table, in this post I’m going to be evaluating some common query constructs to see when a seek really is the most optimal operator.
Let’s start with the simplest case, and I’m going to use the AdventureWorks database for these queries.
select ProductID, Name from Production.Product
In this case I get an index scan on the AK_Product_Name index and that makes perfect sense. I’m asking for all the rows in the table. there is no way that SQL can use a seek to execute that query. For there to be a seek, there has to be a SARGable predicate within the query that can be used for the seek.
Now, how about this one
SELECT ProductID, Name from Production.Product WHERE UPPER(Name) LIKE ‘MOUNTAIN%’
There’s a predicate and it’s on a column that’s indexed, but we still get an index scan. The scan is there because that predicate is not a SARGable one because there is a function on the column. Since the DB in question is not case sensitive, there’s no need for that particular function and, if it’s removed, we get an index seek. So in this case, the scan is not optimal and we can convert it into a much more efficient index seek with a small modification.
So far, so good.
Let’s try adding another column to this select.
SELECT ProductID, Name, ProductNumber from Production.Product WHERE Name LIKE ‘MOUNTAIN%’
Again I have a scan, and not just an index scan, I have a clustered index scan. A read of the entire table. Why? The predicate is SARGable and there is an index with that as the leading column.
The index on ProductNumber is not covering for this query (it doesn’t have the columns ProductNumber in it). SQL has decided that, based on the number of rows returned, it’s better for it to scan the cluster than to seek on the noncluster and do a large number of lookups.
In this particular case, the scan of the cluster is better than the seeks and bookmark lookups. When tested with an index hint, the scan of the cluster did 15 logical reads and the seek with book mark lookup did 79. So, without widening the index, here the scan is the optimal way to run this query.
select p.Name, Sum(sod.LineTotal) AS TotalPerProduct
from Sales.SalesOrderDetail sod
inner join Sales.SalesOrderHeader soh on sod.SalesOrderID = soh.SalesOrderID
inner join Production.Product p on sod.ProductID = p.ProductID
where SalesPersonID = 277
Group by p.name
Here I’ve got two seeks, on the sales order tables, and a full index scan on the product table, specifically of the index on product name.
The scan is there because of the number of seeks that would otherwise be required to execute this. A seek can only return a single value or a range of values. If a seek is required to evalute a join, that seek has to run once for each row in the other table. This is the classic nested loop join. In this case, there are 246 rows in the resultset that the products table needs to be joined to. That means, if SQL evaluated this with seeks, it would have to do 246 seeks. There are only 504 rows in total in the products table.
To test this, I can use the ForceSeek hint (SQL 2008 only) to force a seek on the clustered index (the one on the join column) and compare the IOs
Scan of Products:
Table ‘Product’. Scan count 1, logical reads 5, physical reads 0
Forced seek of Products:
Table ‘Product’. Scan count 0, logical reads 492, physical reads 0
That’s a major difference in the number of IOs.
So, in conclusion…. Scans are not the ideal query operators, usually are not optimal and can indicate missing indexes or poorly written queries. However there are times that scanning an index or even a table is the most optimal way of processing a query, so if there’s a query that has an index/table scan in it, maybe spend a few minutes understanding why the scan’s there in the first place, before spending time trying to get rid of it.