So, following on from my post last week, I’m going to take a look at how selectivity and index column order affect inequality predicates.
One thing to note straight off is that the selectivity of a column is much less important for inequality predicates than it was for equality. For equality predicates, the selectivity alone can give a reasonable idea of the number of rows a particular predicate will return. That’s not the case with inequalities. Also, with inequality predicates, the order of columns in the index becomes very important.
One of the most important considerations with inequality predicates is the number of rows that the predicate will return. An identity column may be highly selective, but if the filter is for all rows > 0 and the identity values start t one, then an index on that column is not going to be very useful.
The other consideration when there are inequality predicates is that only that column and columns to the left of it in the index key can be used for index seeks. Any columns to the right of the column with the inequality is no longer eligible for seeking.
To explain with an example, consider our hypothetical table from the previous post (with one small change):
CREATE TABLE ConsideringIndexOrder ( ID INT, SomeString VARCHAR (100), SomeDate DATETIME DEFAULT GETDATE() );
The same as previously, there’s a single nonclustered index on all three columns, in the order ID, SomeDate, SomeString.
If there’s an inequality predicate, then then the index is only fully seekable for the following queries
… WHERE ID = @ID AND SomeDate = @dt AND SomeString > @str
… WHERE ID = @ID AND SomeDate > @dt
… WHERE ID > @ID
If there’s another predicate, equality or inequality, on a column further to the right in the index, that cannot be executed as part of the index seek, and will be done as a second step, just as happened with equalities when the predicates were not left-based subsets of the index columns.
So, what does that mean for index columns order? Quite simply, if queries are always going to filter with one or more equality predicates and one or more inequality predicates, the columns used for the inequalities must appear further to the right in the index than the equalities.
That’s great when there’s only one inequality predicate, but what happens when there’s more than one? If there are going to be more than one inequality predicate, the one that is likely to return fewer rows should go earlier in the index. This is not to say the most selective one, but the one that will be queried with a more selective range.
Using the above table as an example, if a typical query will run with an inequality on the ID column that on average will return 1000 rows and with an inequality on the date column that will on average return 100 rows, then the date column should go before the ID in the index (assuming that’s the only query)
Let’s take a look at some query scenarios based on the hypothetical table above to see how that index will be used with some inequality predicates.
Scenario 1: Inequality predicate on the ID column
This is probably the simplest of the inequalities. Since ID is the leading column of the index, SQL does a seek to find the beginning of the range (or the first row in the table if applicable) and then reads along the leaf pages of the index until it reaches the end of the range. Those rows are then returned.
Scenario 2: Equality match on the ID column and inequality on the Date column
This one’s also fairly easy. SQL seeks to find a matching ID and the start of the range and then reads along hte index to find the rest of the rows.
Scenario 3: Inequality match on both the ID and Date columns
In this case, only one of the predicates can be used as a seek predicate, the other will be executed as a predicate, meaning that each row that the seek retrieves has to be compared against that predicate. Since the index starts with ID, it’s the inequality on ID that will be picked for the seek. If there was a second index that started with date, that one might be picked instead.
While both columns are mentioned in the seek predicate, note that there’s also a predicate on the SomeDate column, which is not present in the simple index seeks.
Thanks for these posts on indexing, they are really informative.
Pleasure. I’m glad you’re enjoying them
Thanks man. Your articles are really informative. Thanks
Pingback: Now syndicating Gail Shaw and Kendal Van Dyke | Brent Ozar - SQL Server DBA
Place Equality column first in your indexes,
you better be bold,
or the index will not be fully seek-able,
so I’ve been told.
Now if you know what is good for you,
then you’ll think of this twice,
So you can remember the nursery rhythm,
that offered this advice.
Wonderful. Love the verse.
If I run “sp_createstats ‘indexonly'”, SQL will create statistics for all columns being used in an index. If so, is the location of columns in an index still an issue?
Statistics are not indexes. Everything I’ve said about the ability for a query to use an index based on column order applies whether or not there are statistics on the columns or not
Great work. Thank you very much for all the wonderful and informative posts.I have been wondering how would one speed up a range query on datetime when the query does nothing but an index seek and takes more than 10 secs to return over a million rows?
IO subsystem and network speed most likely. Could also be blocking.
You’d have to check the wait type the query incurs
Does statistics count towards index limit for SQL table?
Not in SQL 2005 or above.
It’s really wonderful. Thank you very much for this informative post.
I found these post on indexes very informative and useful. The one question I have is that when I was reading Microsoft’s Index Design Guide it seems to recommend the exact opposite of your recommendation of placing the column with the inequality last in the index. Here is a quote:
.
Am I misunderstanding something?
Test my recommendation, test Microsoft’s and decide for yourself who’s right.
You may want to read this first: https://www.sqlinthewild.co.za/index.php/2011/02/01/is-a-clustered-index-best-for-range-queries/