One wide index or multiple narrow indexes?

TSQL2sDay150x150 Or “If one index is good, surely many indexes (indexes? indices? indi?) will be better

This is a question that comes up very often on the forums. Something along the lines of:

I have a query with multiple where clause conditions on a table. Should I create one index for each condition, or one index with all the columns in it?

The question basically boils down to this: Which is more optimal and more likely for the optimiser to pick, a single seek operation against a wide index that seeks on all three conditions in one go, or three seek operations against three indexes followed by a join to get back the final set of rows.

One thing to keep in mind is that one of the jobs of an index is to reduce the number of rows in consideration for a query as early as possible in the query’s execution.

So let’s take a made-up example. Let’s say we have a table with a number of columns in it. A query is run against that table with three conditions in the where clause

WHERE ColA = @A AND ColB = @B AND ColC = @C

Let’s further say that 1000 rows qualify for the condition ColA = @A, 15000 rows qualify for ColB = @B and 30000 rows qualify for ColC = @C. The total number of rows that qualify for all three conditions is 25.

Which sounds like it would be more efficient?

  • Seek on an index with all three columns and retrieve just 25 rows
  • Seek on an index on ColA, retrieve 1000 rows, seek on an index on ColB, retrieve 15000 rows, seek on an index on ColC, retrieve 30000 rows then join the three result-sets together to get the desired 25 rows (called an Index Intersection)

Time for some tests to find out.

CREATE TABLE TestingIndexUsage (
id INT IDENTITY PRIMARY KEY,
FilterColumn1 INT,
FilterColumn2 INT,
FilterColumn3 INT,
Filler CHAR(500) DEFAULT ''-- simulate other columns in the table.
)
GO

INSERT INTO TestingIndexUsage (FilterColumn1, FilterColumn2, FilterColumn3)
SELECT TOP ( 1000000 )
ABS(CHECKSUM(NEWID()))%200,
ABS(CHECKSUM(NEWID()))%40,
ABS(CHECKSUM(NEWID()))%20
FROM msdb.sys.columns a CROSS JOIN msdb.sys.columns b
GO

First off, I’m going to create three individual indexes on the three filter columns and see what kind of plan SQL comes up with.

CREATE INDEX idx_Temp_FilterColumn1 ON dbo.TestingIndexUsage (FilterColumn1)
CREATE INDEX idx_Temp_FilterColumn2 ON dbo.TestingIndexUsage (FilterColumn2)
CREATE INDEX idx_Temp_FilterColumn3 ON dbo.TestingIndexUsage (FilterColumn3)

And the query…

SELECT ID FROM dbo.TestingIndexUsage
WHERE FilterColumn1 = 68 -- 4993 matching rows
AND FilterColumn2 = 26 -- 24818 matching rows
AND FilterColumn3 = 3  -- 49915 matching rows

The comments show how many rows each predicate returns alone. Combined they return 19 rows.

The plan shows the semi-expected index intersection. Seeks on 3 indexes, two merge join operators to join the three resultsets into one.

IndexIntersection

But what about the performance characteristics?

Table ‘TestingIndexUsage’. Scan count 3, logical reads 150, physical reads 0.

SQL Server Execution Times:
CPU time = 422 ms,  elapsed time = 435 ms.

The reads aren’t very high (as the indexes are extremely narrow), but that CPU time is not exactly low. Almost half a second on the CPU to return 19 rows from a 1 million row table? Not good, especially if this is going to run often.

Right, so that’s the three separate indexes. What about the case of a single index with all three columns. In this case, because all three are SARGable equality predicates, the order of the columns isn’t critical for index usage, so I’ll put them in order of selectivity.

DROP INDEX idx_Temp_FilterColumn1 ON dbo.TestingIndexUsage
DROP INDEX idx_Temp_FilterColumn2 ON dbo.TestingIndexUsage
DROP INDEX idx_Temp_FilterColumn3 ON dbo.TestingIndexUsage

CREATE INDEX idx_Temp_FilterColumn123 ON dbo.TestingIndexUsage (FilterColumn1, FilterColumn2, FilterColumn3)

And run the query again.

As kinda expected, the execution plan has a single index seek operation. Exec plan looks cleaner, what do the performance characteristics say?

Table ‘TestingIndexUsage’. Scan count 1, logical reads 3, physical reads 0.

SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.

Just about says it all. 147 fewer reads and a 100% reduction in CPU cost. The reduction in reads isn’t going to make a major difference, the reads were low anyway, but the reduction in CPU cost is going to make an impact if this query is frequently run.

So what can we conclude from this?

The optimal index for a query with multiple conditions in the where clause is a single index with all the columns that are used in the where clause in it. The order of these columns may matter, depending on how they are used in the where clause (see Equality predicates and Inequality predicates)

SQL can use multiple indexes on a single table (Index Intersection), but it’s not the most efficient option. It’s worth nothing that SQL won’t always chose to do the index intersection. It may quite well decide that a table/clustered index scan is faster than the multiple seeks and joins that the intersection will do. Or, if one of the conditions is very selective, it may decide to seek on one of the indexes, do key lookups to fetch the rest of the columns and then do secondary filters to evaluate the rest of the predicates.

Now it may not always be possible to create a perfect index for all queries on a table, so in some cases, especially for less important queries, having multiple indexes that SQL can seek and intersect may be adequate, but for the more critical, more frequently run queries you probably want a single index with the appropriate columns.

As an aside, this is why the often-mentioned index ‘strategy’ of a single column index on each column of a table is near-useless and certainly not worth the title ‘strategy’.

26 Responses to “One wide index or multiple narrow indexes?”

  1. Hi,

    What would be the ideal index design, when each column in the WHERE clause is optional, eg. “WHERE ColA = @A AND ColC = @C”, or “WHERE ColB = @B AND ColC = @C”? In this case it would be the better option to work with 3 separate indexes, or?

    Thanks

    -Klaus

  2. Thanks for this article. Very clear explaination of the difference between these two types of index usage.

  3. Klaus, in that cast you’d probably want 2 different 2-column wide indexes. (A, C) and (B, C), but it really depends on how the queries are designed and working, and how selective those columns are.

    What you want is the minimum number of indexes to fully (if possible) support the queries.

  4. Great post!

    Question: Does INCLUDE impact this? What are your thoughts on creating an index on FilterColumn1 that includes values for FilterColumn2 and FilterColumn3?

    :{> Andy

  5. Not if all three columns are filtered on. With FilterColumns 2 and 3 only as include columns, SQL can’t (obviously) seek on them. It can only seek on Column1.

    Now sure, those columns are in the index, so no index intersection or additional lookups are required, but SQL still has to retrieve far more data than is actually needed and then filter out the rows it doesn’t need.

    It goes back to one of my key principles for indexes – allow SQL to reduce the number of rows as much as possible as early as possible.

    Would you prefer to read all the Browns in the telephone directory looking for M. Brown who lives in Benoni, or would you prefer to be able to go straight to Brown, M. Benoni?

  6. Gila,

    Always detailed and informative, great post. Thanks for helping those in need of solid direction on SQL Server.

    Thomas LeBlanc
    TheSmilingDBA

  7. [...] Unique Indexes can also do without columns in the index nodes. 12. Gail Shaw (@SQLintheWILD) One wide index or multiple narrow indexes? Gail writes an authoritative post on indexing strategy. Hers is one of the few contribution that [...]

  8. Gail,
    Not only a great article, but excellent examples. I’ve learned a great deal about the internals of SQL Server from all of your articles. Thanks for sharing!
    Steve

  9. Glad you’re finding them useful.

  10. Hi Claus

    The one index solution will still be the best choice even if you have three separate where clauses. The reason for this is that the execution plan will be the same for each of the where statements and you get the added benefit of having only one index for Insert, Deletions and Additions which will save a lot of time.

    Hope it helps out.

  11. I don’t agree.

    If the index is on three columns (A, B, C), any query that does not filter on A cannot seek on that index. See http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

  12. I’ve wondered about these questions myself… Thank you for the post and for answering the questions in the comments Gail.

  13. So well written, easy to understand, and to the point. Thanks.

  14. Excellent article, thanks for sharing.

  15. Great Explanation..!!!!!

  16. Direct, to the point, and brings about a lot of clarity. Great stuff!

  17. Thanks. Glad you found it useful.

  18. Once again your knowledge puts DBA’s to shame.

  19. Excellent.. !!!

  20. What a learning it was to read this article. I have a question here from architecture point of view.When you are not sure that what queries will be written over created table; what should be indexing strategy? We are using independent indexes in such cases and I think I am going to follow it even after getting convinced from this article.

  21. That’s not all that uncommon a scenario.

    You trace and monitor and create indexes to support the most frequent, most important queries that you pick up in your monitoring.

  22. Post is really enlightening. How do i create indexes for table A, on which i sometimes i query on 3 columns and sometimes i query on 2 columns. From the above Post i should create 2 Index one have 3 columns and another one which has 2 columns ? Please advise

  23. Forgot to mention that Table A has only Col1, Col2 and Col3 columns.

  24. I suggest you post that as a forum question somewhere like SQLServerCentral. Without details of what then where clauses actually look like, that’s very hard to answer.

    See http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/ and http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/

  25. Well, I was curious as to what multi-index advantages MS SQL Server had over MySQL… turns out none. Thanks for the analysis.

  26. I have a table which has around 50 columns. But there are only 6 columns on which we perform regular search. When I am trying to create a non clustered index on it I am getting the warning of 900 bytes. This would be a problem as we also do bulk update/inserts. So can onyone help me in finding a solution?

    If I put few columns out of those 6 in INCLUDE, will that cause any performance issue?

    Lets say the datatypes for the columns are -

    ColumnA (nvarchar(255),null)
    ColumnB (Int, null)
    ColumnC (nvarchar(255),null)
    ColumnD (Datetime,Null)
    ColumnE (nvarchar(255),null)

    NOTE: All these above columns are used in the same query to retrieve data. Please let me know an efficient way of doing it.

Leave a Reply