When I looked at indexing for queries containing predicates combined with OR, it became clear that the are some restrictive requirements for indexes for the optimiser to consider using the indexes for seek operations.
- Each predicate (or set of predicates) combined with an OR must have a separate index
- All of those indexes must be covering, or the row count of the concatenated result set low enough to make key lookups an option, as the optimiser does not apparent to consider the possibility of doing key lookups for a subset of the predicates before concatenating the result sets.
So what can be done if it’s not possible to meet those requirements?
The standard trick is to convert the query with ORs into multiple queries combined with UNION. The idea is that since OR predicates are evaluated separately and the result sets concatenated, we can do that manually by writing the queries separately and concatenating them using UNION or UNION ALL. (UNION ALL can only be safely used if the predicates are known to be mutually exclusive)
CREATE TABLE Persons ( PersonID INT IDENTITY PRIMARY KEY, FirstName VARCHAR(30), Surname VARCHAR(30), Country CHAR(3), RegistrationDate DATE ) CREATE INDEX idx_Persons_FirstName ON dbo.Persons (FirstName) INCLUDE (Surname) CREATE INDEX idx_Persons_Surname ON dbo.Persons (Surname) INCLUDE (FirstName) GO -- Data population using SQLDataGenerator SELECT FirstName, Surname FROM dbo.Persons WHERE FirstName = 'Daniel' OR Surname = 'Barnes' SELECT FirstName, Surname FROM dbo.Persons WHERE FirstName = 'Daniel' UNION SELECT FirstName, Surname FROM dbo.Persons WHERE Surname = 'Barnes'
In this case, the OR can be replaced with a UNION and the results are the same. The Union form is slightly less efficient according to the execution plan’s costings (60% compared to the OR at 40%), and the two queries have the same general form, with two index seeks and some form of concatenation and remove duplicates.
So in that case it worked fine, although the original form was a little more efficient
Some care does need to be taken however, as the query with OR and the query with UNION may not always be equivalent, and it has to do with the elimination of duplicate rows.
In an OR, if a row qualifies for both of the predicates, it’s only returned once. That should be obvious, it’s how things should work, we don’t want to see the row multiple times just because it qualifies for more than one of the OR predicates. If we change that to UNION ALL then the row will be returned twice, it appears in both queries that are concatenated, and UNION ALL means combine without eliminating duplicates.
SELECT FirstName, Surname FROM dbo.Persons WHERE FirstName = 'Herman' OR Surname = 'Anderson' SELECT FirstName, Surname FROM dbo.Persons WHERE FirstName = 'Herman' UNION ALL SELECT FirstName, Surname FROM dbo.Persons WHERE Surname = 'Anderson'
In that example, Herman Anderson appears once in the results of the OR query and twice in the results of the UNION ALL. That’s because it qualifies for both predicates. The OR eliminated the duplication, the UNION ALL does not.
So change that UNION ALL to UNION so that the elimination of duplicate rows is done, the row appears only once and life is good again. Or is it?
SELECT FirstName, Surname FROM dbo.Persons WHERE FirstName = 'Alfred' OR Surname = 'Hickman' SELECT FirstName, Surname FROM dbo.Persons WHERE FirstName = 'Alfred' UNION SELECT FirstName, Surname FROM dbo.Persons WHERE Surname = 'Hickman' ORDER BY FirstName, Surname
This time, Alfred Hickman appears twice in the results from the OR, but only once in the output from the UNION
The difference comes in how the duplicates are eliminated. With an OR, SQL does the elimination of duplicates based on the key value regardless of what may be in the select list. With an UNION, SQL does the elimination of duplicates based on the select list, regardless of what the key value may be and in the above example there were two rows in the table with the value ‘Alfred Hickman’. So with UNION you can lose rows if they are duplicated in the table.
The solution’s fairly simple, if converting an OR into a UNION, ensure that the key column(s) are in the select list, then the duplicate elimination done by the UNION will only remove rows that were part of both result sets, instead of also removing ones that really do appear twice in the table.
So in conclusion, if you’re replacing a query using OR with a query using UNION, be careful with the finer details around duplicates. If you know the conditions are mutually exclusive, use UNION ALL. If you don’t, use UNION and ensure that the table’s key column(s) are present in the select list so that the UNION doesn’t remove rows that you don’t want it to remove.
Holy crap, talk about good timing. We’ve had a huge thread going on about this exact issue just yesterday.
http://www.sqlservercentral.com/Forums/Topic1135362-391-1.aspx
A thread I managed to miss… I trust you’ll post the link if necessary?
Pingback: SFTW - SQL Server Links, News and Community Stuff This Week
Thank you much!
This focused much more on the pitfalls to worry about than why you should do it. Still it planted the seed that I should look into some of these super queries and see if a union might be more efficient.
Well, when is fairly easy. When it performs better than the OR and gives the same results.
Hello, I’m little bit lost in here .. what I’m interesting in is what will be indexes look like when we use last post example with CustomerID,FirstName,SurName, isActive, RegistrationDate and union .. I mean, there is a sentence
“So what happens if it’s not practical to make both indexes covering?”, but doesn’t need to be also two covering indexes when using union as well?
thank you for explaining
ok I have made my own test .. now i understand 🙂
thanks