Catch-all queries

The query form that I refer to as ‘Catch-all’ typically results from search screens in the application where the user may enter any one (or more) of a number of optional parameters. One of the more common ways for such a query to be written in SQL is with multiple predicates in the where clause of the form (WHERE SomeColumn = @SomeVariable OR @SomeVariable IN NULL)

Now this does work, the problem is that it works fairly inefficiently and, on large tables, can result in really poor query performance. I’m going to take a look at why that is the case and what alternatives there are.

Erland Sommarskog has written on this as well, and in a lot more detail than I’m going to. His article on dynamic search conditions is well worth reading, as are the rest of his articles.

A typical example of a ‘catch-all’ query would be this one, based off a table in the AdventureWorks database.

CREATE PROCEDURE SearchHistory
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost from Production.TransactionHistory
WHERE (ProductID = @Product Or @Product IS NULL)
AND (ReferenceOrderID = @OrderID OR @OrderID Is NULL)
AND (TransactionType = @TransactionType OR @TransactionType Is NULL)
AND (Quantity = @Qty Or @Qty is null)
GO

Now, let’s say that I run that query and pass values for the ProductID and the Transaction type. Let’s further say that there’s a nonclustered index (called idx_TranHistory_TranTypeProductID) on those two columns.

EXEC SearchHistory @Product = 978, @TransactionType = 'W'

Now this returns 52 rows out of 980000 that are in the table, so we’d expect that SQL would use an index seek operation on that index, followed by a bookmark lookup.

Nope. It’s using that index all right, but it’s doing a scan, not a seek. Ok, not great, but not bad. Let me try a different set of parameters

EXEC SearchHistory @Qty = 100

The plan’s exactly the same. No surprise, it was cached the first time and then reused. There’s a problem here though, the index that’s used is completely inappropriate and there’s a bookmark lookup that ran almost a million times. No wonder this execution took 3 seconds and 2,949,715 IOs to return 29 rows.

Ok, so let me try a different form of the catch-all query

CREATE PROCEDURE SearchHistory_Improved
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost from Production.TransactionHistory
WHERE (ProductID = CASE WHEN @Product IS NULL THEN ProductID ELSE @Product END)
AND (ReferenceOrderID = CASE WHEN @OrderID IS NULL THEN ReferenceOrderID ELSE @OrderID END)
AND (TransactionType = CASE WHEN @TransactionType IS NULL THEN TransactionType ELSE @TransactionType END)
AND (Quantity = CASE WHEN @Qty IS NULL THEN Quantity ELSE @Qty END)
GO

Let’s see what that does for the first test:

EXEC SearchHistory_Improved @Product = 978, @TransactionType = 'W'

Well that’s no better. Full blown table scan.

The problem with these types of queries is that there is no stable plan. The optimal plan differs completely depending on what paramters are passed. The optimiser can tell that and it plays safe. It creates plans that will always work. That’s (one of the reasons) why in the first example it was an index scan, not an index seek.

The downside of the safe plan is that it’s highly unlikely to be a good plan and, even if it is, it won’t be good for all possible combinations of plans.

So, how to handle this type of query? Well, there are typically two ways.

Recompile

This is only an option on SQL 2008. On 2008, if the query is specified with the OPTION (RECOMPILE) hint, then the optimiser knows it doesn’t have to worry about safe plans because the plan will never be reused. In fact, if I add that hint to the query in the first example, I get the expected index seek.

Dynamic SQL

The other option is to build up the query string dynamically, based on the parameters passed and then to use sp_executesql to run it. There are the ususal downsides to dynamic SQL but, it may be that the performance improvement is worth it.

CREATE PROCEDURE SearchHistory_Dynamic (@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
DECLARE @sSQL NVARCHAR(2000), @Where NVARCHAR(1000) = ''
SET @sSQL = 'SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost
from Production.TransactionHistory '

IF @Product is not null
SET @Where = @Where + 'AND ProductID = @_Product '
IF @OrderID is not null
SET @Where = @Where + 'AND ReferenceOrderID = @_OrderID '
IF @TransactionType IS NOT NULL
SET @Where = @Where + 'AND TransactionType = @_TransactionType '
IF @Qty IS NOT NULL
SET @Where = @Where + 'AND Quantity = @_Qty '

IF LEN(@Where) > 0
SET @sSQL = @sSQL + 'WHERE ' + RIGHT(@Where, LEN(@Where)-3)

EXEC sp_executesql @sSQL,
N'@_Product int, @_OrderID int, @_TransactionType char(1), @_Qty int',
@_Product = @Product, @_OrderID = @OrderID, @_TransactionType = @TransactionType, @_Qty = @Qty

GO

Note that there’s no SQL injection vulnerability in this. The parameters are never concatenated into the string and the execution is parametrised.

Now each different set of parameters gets a different cached plan, optimal for that particular set of parameters.

EXEC SearchHistory_Dynamic @Product = 978, @TransactionType = 'W'

EXEC SearchHistory_Dynamic @Qty = 100

The first gets an index seek, the second a clustered index scan (because there’s no index on Quantity). Much better than the behaviour with the earlier non-dynamic versions.

74 Comments

  1. Jack Corbett

    Nice post Gail. I do have a question that I probably should already know the answer to, but I don’t so I’ll ask. How do you handle security in the dynamic sql option since it changes context and I typically would not grant any read rights to a user?

    Reply
  2. Michelle Ufford

    Great article, Gail! This was one of those things I had to learn the hard way when I was first getting started. It seemed so much easier to build one wider proc with a few extra parameters than to build several procs that did almost the same thing. I’ve since learned that the performance just isn’t there, which you’ve done a great job of illustrating.

    I tend to use ‘prepared SQL’ whenever I have a search-type proc, and it seems to perform fairly well.

    Reply
  3. Jerry Hung

    Good read indeed, it sometimes presents as a problem in SSRS reports as well, as users love to have so many parameters to choose from (NULL or multi-value, cascading parameters, …)

    And people tend to not think too much when it is just “reports” vs production.

    Reply
  4. Gail

    Jack: EXECUTE AS on the stored procedure.

    So one user who only has read rights on the specific tables that are used in the dynamic SQL (and no other rights at all, including login rights) and then the procedure definition would be

    CREATE PROCEDURE SearchHistory_Dynamic
    WITH EXECUTE AS ‘DynamicSQLUser’
    AS
    ….

    Reply
  5. uri

    good article. thanks
    now a question :
    is the second way suggested also prevents parameter sniffing or should I use the following code:

    CREATE PROCEDURE SearchHistory_Improved (@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
    AS

    declare @i_ProductId int
    declare @i_OrderId int
    declare @i_TransactionType char(1)
    declare @i_Qty int

    set @i_ProductId = CASE WHEN @Product IS NULL THEN ProductID ELSE @Product END
    set @i_OrderId = CASE WHEN @OrderID IS NULL THEN ReferenceOrderID ELSE @OrderID END
    set @i_TransactionType = CASE WHEN @TransactionType IS NULL THEN TransactionType ELSE @TransactionType END
    set @i_Qty = CASE WHEN @Qty IS NULL THEN Quantity ELSE @Qty END

    SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost from Production.TransactionHistory
    WHERE ProductID = @i_ProductId
    AND ReferenceOrderID = @i_OrderId
    AND TransactionType = @i_TransactionType
    AND Quantity = @i_Qty

    GO

    Reply
  6. Gail

    That code you’ve posted doesn’t work.

    Msg 207, Level 16, State 1, Procedure SearchHistory_Improved, Line 9
    Invalid column name ‘ProductID’.
    Msg 207, Level 16, State 1, Procedure SearchHistory_Improved, Line 10
    Invalid column name ‘ReferenceOrderID’.
    Msg 207, Level 16, State 1, Procedure SearchHistory_Improved, Line 11
    Invalid column name ‘TransactionType’.
    Msg 207, Level 16, State 1, Procedure SearchHistory_Improved, Line 12
    Invalid column name ‘Quantity’.

    Look at your SET statements. They reference columns of a table within them.

    Even if you could do that, it wouldn’t work as well as the dynamic or recompiled versions because the optimiser can’t sniff the value of variables and tends to make bad guesses about cardinality

    See – http://www.sqlinthewild.co.za/index.php/2008/05/22/parameter-sniffing-pt-2/

    Reply
  7. Karthik

    A very good post Gail. Thanks..

    Reply
  8. Markus Gallagher

    How does COALESCE(@Qty, Quantity) = Quantity perform?

    Reply
  9. Gail

    Considering that a COALESCE is treated internally as a CASE statement, probably the same as the case statement. I’ll test it and the ISNULL option and post a followup later this week.

    Reply
  10. Dan

    I’m confused. In the article it states:

    Note that there’s no SQL injection vulnerability in this. The parameters are never concatenated into the string

    But, looking at the code in the window, they are concatenated. What did I miss?

    Reply
  11. Doug Osborne

    Gail,

    I’ve done the exact same thing – just a little different.

    SET @sSQL = ‘SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost
    from Production.TransactionHistory
    WHERE 1 = 1 ‘

    IF @Product is null
    SET @P1 = ”
    ELSE
    SET @P1 = ‘AND ProductID = @_Product ‘
    .
    .
    .
    IF @Qty IS NULL
    SET @P4 = ”
    ELSE
    SET @P4 = ‘AND Quantity = @_Qty ‘

    SET @SQL = @sSQL + @P1 + @P2 + @P3 + @P4

    What is the cost of the reassignment of the strings each time in your scenario? Is it negligible?

    Doug

    Reply
  12. Gail

    Dan: Where?
    The parameters that were passed to the function were never concatenated into the string. There’s no piece of code that looks like this

    SET @Where = @Where + ‘AND TransactionType =”’ + @_TransactionType + ””

    That’s concatenating the value of the parameter into the string and that’s what makes a piece of dynamic code vulnerable to injection.

    What I have done is set up variables within the SQL string (that’s what the @_Qty and the others are) and then passed them values at execution time using sp_executesql.

    Reply
  13. Alex

    Of course indexes don’t work on dynamic “where” clause, whatever you use, COALESCE, CASE, IS NULL, etc. And dynamic “where” clause is not always bad or always good. Just do smart job, check you parameters before run query. If you have some parameters on indexed column run them first and run other on subsequent. This is not some magic; it always depends on every case.

    Reply
  14. Gail

    Alex, could you explain in more detail please? I’m not sure I fully understand what you’re suggesting.

    Reply
  15. Stephen

    Gail,
    Good article – this is a common situation at my current employer, usually going unnoticed until the inappropriate plan results in a 1,000-fold increase in the execution cost.
    The usual defense here is to keeep the original proc as a wrapper for variations of the original, where the variations deal with specific more targetted combinations of meaningful parameters. Of course it’s impractical to do this for every possible combination of (say) 10 possible parameters. It’s usually enough to write purpose-specific procs to avoid the major plan inefficiencies, and have the other (less diabolical) combinations fall back to the catch-all query.

    Reply
  16. Alex

    As I sad it is really depend on every case, but usually, on big tables it is better to run as much better plan as you can get first and save result to some where, on temporary table for example, and after run what ever left on that result. I hope my explanation clear, English not my first language.

    Reply
  17. Gail

    Alex: Yup, I think I understand you. Use some of the filters first, save to a temp table and then filter further.
    Can you maybe give a short code example (on made up tables, or the Adventureworks tables) as I would like to do a followon post with some timings on various options.

    Reply
  18. JacekO

    Hi Gail.
    This is an intersting topic and I run some tests myself to see how this works and got puzzled by one thing.

    There is an index on ProductID in the Production.TransactionHistory table.
    However I can not figure out why the server is not using it even if the query is as plain as this one
    SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost
    from Production.TransactionHistory
    WHERE ProductID = 331
    It uses Clustered Index Scan

    But if I create a new index on Quantity and run this
    SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost
    from Production.TransactionHistory
    WHERE Quantity = 100
    I got Index Seek with Key Lookup
    Any ideas what am I missing.

    Reply
  19. Gail

    Most likely it’s due to selectivity. If those indexes aren’t covering then there’s a cost to using the NC index as SQL has to do lookups to the cluster for each row. The tipping point is usually about 0.5% of the table.

    If the query needs more than that, SQL’s likely to table scan rather than doing the NC index seek and lots and lots of key lookups.

    I wrote a blog entry a while back on this – http://www.sqlinthewild.co.za/index.php/2009/01/09/seek-or-scan/

    Reply
  20. Pingback: Catch all queries and indexing - SQL and the like

  21. Ed

    Gail, I must be missing something because I’ve tinkered with yoru example and no matter what I try I get errors. I repalced all the charatcers that did not copy over correctly like the single quotes but still it errors out on me.

    Reply
  22. Ed

    Gail – Nevermind I figured out my problem. The sample AdventureWorks DB is set to a case senative collation.

    Reply
  23. Gail

    I think that may depend on the server collation. AdventureWorks is case insensitive on my server (as are all of my DBs). After a rather nasty experience a few years back I’m allergic to case-sensitive databases 😉

    Reply
  24. Brian

    Thank you so much for this article, it was very informative and extremely helpful. This is by far one of the most useful SQL articles I’ve read over the years!

    Reply
  25. David Walker

    I think you left out an important section from “Dynamic Search Conditions”. Instead of using this form:

    WHERE (key1 = @key1 OR @key1 IS NULL)
    AND (key2 = @key2 OR @key2 IS NULL)
    AND (key3 = @key3 OR @key3 IS NULL

    you should consider using this:

    WHERE (key1 = @key1 AND @key1 IS NOT NULL)
    OR (key2 = @key2 AND @key2 IS NOT NULL)
    OR (key3 = @key3 AND @key3 IS NOT NULL)

    As Erland Sommarskog says, “The odds are very good that SQL Server will generate a plan which seeks the three key indexes and then merges the result either by index concatenation or some other method. Here is the real big scoop: thanks to the condition, @x IS NOT NULL, SQL Server adds a filter with a startup expression to the plan, so if the corresponding variable is NULL, SQL Server will not access that index at all.”

    I have found in my testing that this solution performs very well; indexes ARE used, performance is great, and dynamic code is not required.

    Reply
  26. Michael

    I thought that dynamic SQL was a worse choice than the catch-all option, but this article says otherwise ? I got a big chunk of dynamic sql from a contractor and it’s harder to follow than non-dynamic. What are the drawbacks of dynamic ?

    Reply
  27. Gail

    Main drawbacks of dynamic is security. Also the loss of syntax highlighting, compile time error checking and intellisense. It’s not something I’d use everywhere, just for the places it does work well (and this is one of them)

    Reply
  28. Emtucifor

    David Walker, your query isn’t logically the same. If 3 keys were passed in, your OR query would pull all rows where any one matches, not rows where all three match.

    Reply
  29. Daniel

    I guess Walker’s query logic can be modified as such to handle the scenario when two or three parameters are passed.
    WHERE ((key1 = @key1 AND @key1 IS NOT NULL) OR (@key1 IS NULL))
    AND ((key2 = @key2 AND @key2 IS NOT NULL) OR (@key2 IS NULL))
    AND ((key3 = @key3 AND @key3 IS NOT NULL) OR (@key3 IS NULL))

    btw, this logic would be a better approach than the dynamic query..atleast for these types of situatios…

    Reply
  30. Mark Tassin

    Gail,

    How would you use this methodology when working with a list of keywords aka a keyword search where a hit on any of the keywords should return a record (if there are multiple hits, just return the record once).

    Reply
  31. Gail (Post author)

    Frankly, if I was doing keyword searches I’d probably be using full text search.

    Reply
  32. Steve Lund

    Could you also use the WITH RECOMPILE option on the proc if you have SQL 2005 instead of 2008? Would that give the same result as the OPTION (RECOMPILE) query hint as there is only one statement in the proc?

    Reply
  33. Gail (Post author)

    No, you’ll still get the non-optimal plan. In 2005 the optimiser is required to create plans that are safe for reuse, even if they won’t be reused (because of recompile). Only in 2008 SP2 onwards does the recompile work properly – plans optimal for that particular parameter set.

    Reply
  34. Dave Burrows

    Gail,
    I have, in the past, done extract to temp table and then filter and found significant performance boost. This does depend on what indexes are present and there is at least one mandatory search parameter. So, for example, using AdventureWorks and using mandatory FromDate and ToDate for TransactionDate, the revised SQL would look like

    SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost
    INTO #hist
    FROM Production.TransactionHistory
    WHERE TransactionDate BETWEEN @FromDate AND @ToDate

    IF @Product IS NOT NULL DELETE #hist WHERE ProductID @Product
    IF @OrderID IS NOT NULL DELETE #hist WHERE ReferenceOrderID @OrderID
    IF @TransactionType IS NOT NULL DELETE #hist WHERE TransactionType @TransactionType
    IF @Qty IS NOT NULL DELETE #hist WHERE Quantity @Qty

    SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost
    FROM #hist

    Reply
  35. Stefan Krzywicki

    So what do you do in your Dynamic query when ‘WHERE TransactionType IS NULL’ is a valid search parameter? Do you have to pass a different value and filter/build the query based on that? If you pass NULL instead of just leaving the parameter out of the EXEC that calls SearchHistory_Dynamic, is there a way to tell?

    Reply
  36. Gail (Post author)

    Maybe another parameter for the proc?

    @TranType INT, @TranTypeIsNull BIT, …

    and then add additional logic when you’re building up the dynamic SQL. Or maybe a special value for the parameter that means ‘column is null’, -1 or MinInt if the column is identity

    Reply
  37. Dave Burrows

    Or use a non NULL char that is not a valid TransactionType (maybe empty string) to mean ‘ALL’

    Reply
  38. Stefan Krzywicki

    Dave, I had thought of the non-null other value and that’s a great way to do it for most of the data types, but one of the ones I’m concerned about is a bit column where 1, 0 and NULL are all valid entries.

    Gail, thanks! I hadn’t thought of using 2 variables for the same column, but I think that’ll work perfectly for NULL bit columns.

    Reply
  39. Dave Burrows

    Stefan. Yeah that is always a problem especially when extended to dates 🙁
    For bit columns you could use and translate a tinyint for bit column selection, ie 0=0 1=1 2=NULL 3=ALL
    I have done this before, as long as both ends talk the same translation and data type not a problem although the use of a different data type may be unconfortable.

    Reply
  40. Sergio A.

    Hey thanks for these good examples. I’m working in a query where the user enters different parameters and still don’t know which option I’m going to take. I have done dynamic SQL before but I don’t feel very confident about that. Still as you were saying, the performance could be way better than having several conditions in the WHERE clause.

    Reply
  41. Pingback: Be wary of using ISNULL() for retreiving all rows « Simran Jindal's Blog

  42. Florian Reischl

    Hi Gail
    Wanted to say thanks (way too late though..) for this really great post.
    Dunno how often I referred people here till now and just did it again.
    Greets
    Flo

    Reply
  43. Dipesh Trivedi

    Hi
    i want to find the combination of Amounts…
    For Example : i am inputing amount in textbox vb.net, and result should be all the combination of that entered amount in DB.
    textbox.text = 500 then result = lets say
    1. 500
    2. 250, 250 (in 2 rows)
    3. 100,100,100,100,100 (in 5 rows)

    is this possible?
    Someone can help me on this Please..
    My E-mail Id is <removed>

    Regards,
    Dipesh

    Reply
  44. Gail (Post author)

    Please post this somewhere like the MSDN forums. There’s no where near enough information given, and your question has nothing to do with this post.

    Reply
  45. Lin Brown

    Hi, i’m new to sql and i have this problem with updating data from another table.

    i want to update table2.message based on the criteria of table1.name. for example, all records named John will be updated with ‘Msg1’ in table 2.message. Im using MS SQL 2000 and below is the scenario.

    <editor: removed>

    do you have suggestion on how to update this in bulk without preselecting all the names?

    Reply
  46. Gail (Post author)

    Lin, best if you post this on a SQL forum, like the ones at MSDN or SQLServerCentral. Your question has nothing to do with the post.

    Reply
  47. Dave Jackson

    Gail

    just to let you now I’ve referenced this page from something I’ve posted to my site. http://glossopian.co.uk/pmwiki.php?n=Main.TestScriptGenerator
    I hope you enjoy it.

    Reply
  48. Gail (Post author)

    Nice. I’d leave a comment there, but I don’t see any way to do so.

    Reply
  49. RickD

    Hi Gail, Niice article, I wish I could follow the dynamic SQL route but am still struggling alomng with SQL 2005. I got around the inefficiency by intoducing pagination to my query. I get 10 (or 20,30 etc) records key fields with all the parameters (the query itself is very convoluted, so not going to even attempt to post it), then get the detail I need in a following query. This and other changes (many of my parameters are multi-select, so used a CLR to split into mem tables), took a query that was taking well over a second (usually about 1600ms) down to <100ms even for the worst culprits.

    Reply
  50. Gail (Post author)

    The dynamic SQL route works fine in SQL 2005. In fact, on 2005 it’s the best option because you can’t use the OPTION(RECOMPILE)

    Reply

Leave a Comment

Your email address will not be published. Required fields are marked *