One wide index or multiple narrow indexes?

Or “If one index is good, surely many indexes (indexes? indices? indi?) will be better”

This is a question that comes up very often on the forums. Something along the lines of:

I have a query with multiple where clause conditions on a table. Should I create one index for each condition, or one index with all the columns in it?

The question basically boils down to this: Which is more optimal and more likely for the optimiser to pick, a single seek operation against a wide index that seeks on all three conditions in one go, or three seek operations against three indexes followed by a join to get back the final set of rows.

One thing to keep in mind is that one of the jobs of an index is to reduce the number of rows in consideration for a query as early as possible in the query’s execution.

So let’s take a made-up example. Let’s say we have a table with a number of columns in it. A query is run against that table with three conditions in the where clause

WHERE ColA = @A AND ColB = @B AND ColC = @C

Let’s further say that 1000 rows qualify for the condition ColA = @A, 15000 rows qualify for ColB = @B and 30000 rows qualify for ColC = @C. The total number of rows that qualify for all three conditions is 25.

Which sounds like it would be more efficient?

Seek on an index with all three columns and retrieve just 25 rows
Seek on an index on ColA, retrieve 1000 rows, seek on an index on ColB, retrieve 15000 rows, seek on an index on ColC, retrieve 30000 rows then join the three result-sets together to get the desired 25 rows (called an Index Intersection)

Time for some tests to find out.

CREATE TABLE TestingIndexUsage (
id INT IDENTITY PRIMARY KEY,
FilterColumn1 INT,
FilterColumn2 INT,
FilterColumn3 INT,
Filler CHAR(500) DEFAULT ''-- simulate other columns in the table.
)
GO

INSERT INTO TestingIndexUsage (FilterColumn1, FilterColumn2, FilterColumn3)
SELECT TOP ( 1000000 )
ABS(CHECKSUM(NEWID()))%200,
ABS(CHECKSUM(NEWID()))%40,
ABS(CHECKSUM(NEWID()))%20
FROM msdb.sys.columns a CROSS JOIN msdb.sys.columns b
GO

First off, I’m going to create three individual indexes on the three filter columns and see what kind of plan SQL comes up with.

CREATE INDEX idx_Temp_FilterColumn1 ON dbo.TestingIndexUsage (FilterColumn1)
CREATE INDEX idx_Temp_FilterColumn2 ON dbo.TestingIndexUsage (FilterColumn2)
CREATE INDEX idx_Temp_FilterColumn3 ON dbo.TestingIndexUsage (FilterColumn3)

And the query…

SELECT ID FROM dbo.TestingIndexUsage
WHERE FilterColumn1 = 68 -- 4993 matching rows
AND FilterColumn2 = 26 -- 24818 matching rows
AND FilterColumn3 = 3  -- 49915 matching rows

The comments show how many rows each predicate returns alone. Combined they return 19 rows.

The plan shows the semi-expected index intersection. Seeks on 3 indexes, two merge join operators to join the three resultsets into one.

But what about the performance characteristics?

Table ‘TestingIndexUsage’. Scan count 3, logical reads 150, physical reads 0.

SQL Server Execution Times:
CPU time = 422 ms, elapsed time = 435 ms.

The reads aren’t very high (as the indexes are extremely narrow), but that CPU time is not exactly low. Almost half a second on the CPU to return 19 rows from a 1 million row table? Not good, especially if this is going to run often.

Right, so that’s the three separate indexes. What about the case of a single index with all three columns. In this case, because all three are SARGable equality predicates, the order of the columns isn’t critical for index usage, so I’ll put them in order of selectivity.

DROP INDEX idx_Temp_FilterColumn1 ON dbo.TestingIndexUsage
DROP INDEX idx_Temp_FilterColumn2 ON dbo.TestingIndexUsage
DROP INDEX idx_Temp_FilterColumn3 ON dbo.TestingIndexUsage

CREATE INDEX idx_Temp_FilterColumn123 ON dbo.TestingIndexUsage (FilterColumn1, FilterColumn2, FilterColumn3)

And run the query again.

As kinda expected, the execution plan has a single index seek operation. Exec plan looks cleaner, what do the performance characteristics say?

Table ‘TestingIndexUsage’. Scan count 1, logical reads 3, physical reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

Just about says it all. 147 fewer reads and a 100% reduction in CPU cost. The reduction in reads isn’t going to make a major difference, the reads were low anyway, but the reduction in CPU cost is going to make an impact if this query is frequently run.

So what can we conclude from this?

The optimal index for a query with multiple conditions in the where clause is a single index with all the columns that are used in the where clause in it. The order of these columns may matter, depending on how they are used in the where clause (see Equality predicates and Inequality predicates)

SQL can use multiple indexes on a single table (Index Intersection), but it’s not the most efficient option. It’s worth nothing that SQL won’t always chose to do the index intersection. It may quite well decide that a table/clustered index scan is faster than the multiple seeks and joins that the intersection will do. Or, if one of the conditions is very selective, it may decide to seek on one of the indexes, do key lookups to fetch the rest of the columns and then do secondary filters to evaluate the rest of the predicates.

Now it may not always be possible to create a perfect index for all queries on a table, so in some cases, especially for less important queries, having multiple indexes that SQL can seek and intersect may be adequate, but for the more critical, more frequently run queries you probably want a single index with the appropriate columns.

As an aside, this is why the often-mentioned index ‘strategy’ of a single column index on each column of a table is near-useless and certainly not worth the title ‘strategy’.

31 Comments

Klaus Aschenbrenner 14 September 2010 at 16:48

Hi,

What would be the ideal index design, when each column in the WHERE clause is optional, eg. “WHERE ColA = @A AND ColC = @C”, or “WHERE ColB = @B AND ColC = @C”? In this case it would be the better option to work with 3 separate indexes, or?

Thanks

-Klaus

Reply ↓
Pradeep Singh 14 September 2010 at 17:12

Thanks for this article. Very clear explaination of the difference between these two types of index usage.

Reply ↓
Gail (Post author)14 September 2010 at 17:34

Klaus, in that cast you’d probably want 2 different 2-column wide indexes. (A, C) and (B, C), but it really depends on how the queries are designed and working, and how selective those columns are.

What you want is the minimum number of indexes to fully (if possible) support the queries.

Reply ↓
Andy Leonard 14 September 2010 at 18:19

Great post!

Question: Does INCLUDE impact this? What are your thoughts on creating an index on FilterColumn1 that includes values for FilterColumn2 and FilterColumn3?

:{> Andy

Reply ↓
Gail (Post author)14 September 2010 at 19:24

Not if all three columns are filtered on. With FilterColumns 2 and 3 only as include columns, SQL can’t (obviously) seek on them. It can only seek on Column1.

Now sure, those columns are in the index, so no index intersection or additional lookups are required, but SQL still has to retrieve far more data than is actually needed and then filter out the rows it doesn’t need.

It goes back to one of my key principles for indexes – allow SQL to reduce the number of rows as much as possible as early as possible.

Would you prefer to read all the Browns in the telephone directory looking for M. Brown who lives in Benoni, or would you prefer to be able to go straight to Brown, M. Benoni?

Reply ↓
Thomas LeBlanc 15 September 2010 at 15:45

Gila,

Always detailed and informative, great post. Thanks for helping those in need of solid direction on SQL Server.

Thomas LeBlanc
TheSmilingDBA

Reply ↓
Pingback: T-SQL Tuesday #10 Round up, An index of participants | Michael J. Swart
Steve Dorris 5 October 2010 at 17:44

Gail,
Not only a great article, but excellent examples. I’ve learned a great deal about the internals of SQL Server from all of your articles. Thanks for sharing!
Steve

Reply ↓
Gail (Post author)5 October 2010 at 22:59

Glad you’re finding them useful.

Reply ↓
Eddie Zambrana 6 October 2010 at 23:56

Hi Claus

The one index solution will still be the best choice even if you have three separate where clauses. The reason for this is that the execution plan will be the same for each of the where statements and you get the added benefit of having only one index for Insert, Deletions and Additions which will save a lot of time.

Hope it helps out.

Reply ↓
Gail (Post author)7 October 2010 at 01:16

I don’t agree.

If the index is on three columns (A, B, C), any query that does not filter on A cannot seek on that index. See https://www.sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

Reply ↓
Mark 8 October 2010 at 00:14

I’ve wondered about these questions myself… Thank you for the post and for answering the questions in the comments Gail.

Reply ↓
JJEugene 30 November 2010 at 20:05

So well written, easy to understand, and to the point. Thanks.

Reply ↓
Khalid Flynn 14 December 2010 at 16:43

Excellent article, thanks for sharing.

Reply ↓
Prashant 7 June 2011 at 09:22

Great Explanation..!!!!!

Reply ↓
Sean Smith 3 July 2011 at 19:02

Direct, to the point, and brings about a lot of clarity. Great stuff!

Reply ↓
Gail (Post author)3 July 2011 at 20:07

Thanks. Glad you found it useful.

Reply ↓
MadTester 30 December 2011 at 16:45

Once again your knowledge puts DBA’s to shame.

Reply ↓
Anurag 9 January 2012 at 03:37

Excellent.. !!!

Reply ↓
Utsav 23 July 2012 at 15:28

What a learning it was to read this article. I have a question here from architecture point of view.When you are not sure that what queries will be written over created table; what should be indexing strategy? We are using independent indexes in such cases and I think I am going to follow it even after getting convinced from this article.

Reply ↓
Gail (Post author)23 July 2012 at 18:17

That’s not all that uncommon a scenario.

You trace and monitor and create indexes to support the most frequent, most important queries that you pick up in your monitoring.

Reply ↓
Aaron 24 July 2012 at 22:50

Post is really enlightening. How do i create indexes for table A, on which i sometimes i query on 3 columns and sometimes i query on 2 columns. From the above Post i should create 2 Index one have 3 columns and another one which has 2 columns ? Please advise

Reply ↓
Aaron 24 July 2012 at 22:52

Forgot to mention that Table A has only Col1, Col2 and Col3 columns.

Reply ↓
Gail (Post author)26 July 2012 at 19:48

I suggest you post that as a forum question somewhere like SQLServerCentral. Without details of what then where clauses actually look like, that’s very hard to answer.

See https://www.sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/ and https://www.sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/

Reply ↓
laifukang 2 January 2013 at 12:46

Well, I was curious as to what multi-index advantages MS SQL Server had over MySQL… turns out none. Thanks for the analysis.

Reply ↓
Unmesh 6 October 2014 at 11:40

I have a table which has around 50 columns. But there are only 6 columns on which we perform regular search. When I am trying to create a non clustered index on it I am getting the warning of 900 bytes. This would be a problem as we also do bulk update/inserts. So can onyone help me in finding a solution?

If I put few columns out of those 6 in INCLUDE, will that cause any performance issue?

Lets say the datatypes for the columns are –

ColumnA (nvarchar(255),null)
ColumnB (Int, null)
ColumnC (nvarchar(255),null)
ColumnD (Datetime,Null)
ColumnE (nvarchar(255),null)

NOTE: All these above columns are used in the same query to retrieve data. Please let me know an efficient way of doing it.

Reply ↓
JK Wood 10 November 2014 at 22:56

@Unmesh:

Do your nvarchar columns HAVE to be 255 characters? If you’re not using it, it’s a waste of space. Figure out a length that fits your data, then use it. That should make your NCI creation much easier – this is a data type problem, not an indexing problem.

Reply ↓
Kerneels Roos 6 February 2015 at 12:41

Great post and nice comments, thanks!

However, the single wide index would not be the best choice for a query where the predicates of the where clause are disjunctive (combined with OR) in contrast with to your example where they are conjunctive (AND), i.e.:

WHERE col1 = @col1 OR col2 = @col2 OR col3 = @col3

Intuitively one can see that the single wide index would not be of much use here, since finding the sub branch where col1 = @col1 means very little in terms of col2 and col3 (they may or may not match in the sub branch of the index tree, and might or might not match even where col1 != @col1).

I suspect three seperate indexes would be a better choice here.

– Kerneels

Reply ↓
1. Gail (Post author)7 February 2015 at 22:15
  
  For ORs, see my post https://www.sqlinthewild.co.za/index.php/2011/05/03/indexing-for-ors/
  
  Reply ↓
moojjoo 30 September 2015 at 20:52

In relational databases, a condition (or predicate) in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query. The term is derived from a contraction of Search ARGument ABLE. Interesting to learn something new.

Reply ↓
Sudhanshu 18 March 2018 at 20:36

very nice post Gail, good explained with example.

Reply ↓

One wide index or multiple narrow indexes?

Share this:

31 Comments

Leave a Comment Cancel reply