Catch-all queries

The query form that I refer to as ‘Catch-all’ typically results from search screens in the application where the user may enter any one (or more) of a number of optional parameters. One of the more common ways for such a query to be written in SQL is with multiple predicates in the where clause of the form (WHERE SomeColumn = @SomeVariable OR @SomeVariable IN NULL)

Now this does work, the problem is that it works fairly inefficiently and, on large tables, can result in really poor query performance. I’m going to take a look at why that is the case and what alternatives there are.

Erland Sommarskog has written on this as well, and in a lot more detail than I’m going to. His article on dynamic search conditions is well worth reading, as are the rest of his articles.

A typical example of a ‘catch-all’ query would be this one, based off a table in the AdventureWorks database.

CREATE PROCEDURE SearchHistory
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost from Production.TransactionHistory
WHERE (ProductID = @Product Or @Product IS NULL)
AND (ReferenceOrderID = @OrderID OR @OrderID Is NULL)
AND (TransactionType = @TransactionType OR @TransactionType Is NULL)
AND (Quantity = @Qty Or @Qty is null)
GO

Now, let’s say that I run that query and pass values for the ProductID and the Transaction type. Let’s further say that there’s a nonclustered index (called idx_TranHistory_TranTypeProductID) on those two columns.

EXEC SearchHistory @Product = 978, @TransactionType = 'W'

Now this returns 52 rows out of 980000 that are in the table, so we’d expect that SQL would use an index seek operation on that index, followed by a bookmark lookup.

Nope. It’s using that index all right, but it’s doing a scan, not a seek. Ok, not great, but not bad. Let me try a different set of parameters

EXEC SearchHistory @Qty = 100

The plan’s exactly the same. No surprise, it was cached the first time and then reused. There’s a problem here though, the index that’s used is completely inappropriate and there’s a bookmark lookup that ran almost a million times. No wonder this execution took 3 seconds and 2,949,715 IOs to return 29 rows.

Ok, so let me try a different form of the catch-all query

CREATE PROCEDURE SearchHistory_Improved
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost from Production.TransactionHistory
WHERE (ProductID = CASE WHEN @Product IS NULL THEN ProductID ELSE @Product END)
AND (ReferenceOrderID = CASE WHEN @OrderID IS NULL THEN ReferenceOrderID ELSE @OrderID END)
AND (TransactionType = CASE WHEN @TransactionType IS NULL THEN TransactionType ELSE @TransactionType END)
AND (Quantity = CASE WHEN @Qty IS NULL THEN Quantity ELSE @Qty END)
GO

Let’s see what that does for the first test:

EXEC SearchHistory_Improved @Product = 978, @TransactionType = 'W'

Well that’s no better. Full blown table scan.

The problem with these types of queries is that there is no stable plan. The optimal plan differs completely depending on what paramters are passed. The optimiser can tell that and it plays safe. It creates plans that will always work. That’s (one of the reasons) why in the first example it was an index scan, not an index seek.

The downside of the safe plan is that it’s highly unlikely to be a good plan and, even if it is, it won’t be good for all possible combinations of plans.

So, how to handle this type of query? Well, there are typically two ways.

Recompile

This is only an option on SQL 2008. On 2008, if the query is specified with the OPTION (RECOMPILE) hint, then the optimiser knows it doesn’t have to worry about safe plans because the plan will never be reused. In fact, if I add that hint to the query in the first example, I get the expected index seek.

Dynamic SQL

The other option is to build up the query string dynamically, based on the parameters passed and then to use sp_executesql to run it. There are the ususal downsides to dynamic SQL but, it may be that the performance improvement is worth it.

CREATE PROCEDURE SearchHistory_Dynamic (@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
DECLARE @sSQL NVARCHAR(2000), @Where NVARCHAR(1000) = ''
SET @sSQL = 'SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost
from Production.TransactionHistory '

IF @Product is not null
SET @Where = @Where + 'AND ProductID = @_Product '
IF @OrderID is not null
SET @Where = @Where + 'AND ReferenceOrderID = @_OrderID '
IF @TransactionType IS NOT NULL
SET @Where = @Where + 'AND TransactionType = @_TransactionType '
IF @Qty IS NOT NULL
SET @Where = @Where + 'AND Quantity = @_Qty '

IF LEN(@Where) > 0
SET @sSQL = @sSQL + 'WHERE ' + RIGHT(@Where, LEN(@Where)-3)

EXEC sp_executesql @sSQL,
N'@_Product int, @_OrderID int, @_TransactionType char(1), @_Qty int',
@_Product = @Product, @_OrderID = @OrderID, @_TransactionType = @TransactionType, @_Qty = @Qty

GO

Note that there’s no SQL injection vulnerability in this. The parameters are never concatenated into the string and the execution is parametrised.

Now each different set of parameters gets a different cached plan, optimal for that particular set of parameters.

EXEC SearchHistory_Dynamic @Product = 978, @TransactionType = 'W'

EXEC SearchHistory_Dynamic @Qty = 100

The first gets an index seek, the second a clustered index scan (because there’s no index on Quantity). Much better than the behaviour with the earlier non-dynamic versions.

65 Responses to “Catch-all queries”

  1. Nice post Gail. I do have a question that I probably should already know the answer to, but I don’t so I’ll ask. How do you handle security in the dynamic sql option since it changes context and I typically would not grant any read rights to a user?

  2. Great article, Gail! This was one of those things I had to learn the hard way when I was first getting started. It seemed so much easier to build one wider proc with a few extra parameters than to build several procs that did almost the same thing. I’ve since learned that the performance just isn’t there, which you’ve done a great job of illustrating.

    I tend to use ‘prepared SQL’ whenever I have a search-type proc, and it seems to perform fairly well.

  3. Good read indeed, it sometimes presents as a problem in SSRS reports as well, as users love to have so many parameters to choose from (NULL or multi-value, cascading parameters, …)

    And people tend to not think too much when it is just “reports” vs production.

  4. Jack: EXECUTE AS on the stored procedure.

    So one user who only has read rights on the specific tables that are used in the dynamic SQL (and no other rights at all, including login rights) and then the procedure definition would be

    CREATE PROCEDURE SearchHistory_Dynamic
    WITH EXECUTE AS ‘DynamicSQLUser’
    AS
    ….

  5. good article. thanks
    now a question :
    is the second way suggested also prevents parameter sniffing or should I use the following code:

    CREATE PROCEDURE SearchHistory_Improved (@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
    AS

    declare @i_ProductId int
    declare @i_OrderId int
    declare @i_TransactionType char(1)
    declare @i_Qty int

    set @i_ProductId = CASE WHEN @Product IS NULL THEN ProductID ELSE @Product END
    set @i_OrderId = CASE WHEN @OrderID IS NULL THEN ReferenceOrderID ELSE @OrderID END
    set @i_TransactionType = CASE WHEN @TransactionType IS NULL THEN TransactionType ELSE @TransactionType END
    set @i_Qty = CASE WHEN @Qty IS NULL THEN Quantity ELSE @Qty END

    SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost from Production.TransactionHistory
    WHERE ProductID = @i_ProductId
    AND ReferenceOrderID = @i_OrderId
    AND TransactionType = @i_TransactionType
    AND Quantity = @i_Qty

    GO

  6. That code you’ve posted doesn’t work.

    Msg 207, Level 16, State 1, Procedure SearchHistory_Improved, Line 9
    Invalid column name ‘ProductID’.
    Msg 207, Level 16, State 1, Procedure SearchHistory_Improved, Line 10
    Invalid column name ‘ReferenceOrderID’.
    Msg 207, Level 16, State 1, Procedure SearchHistory_Improved, Line 11
    Invalid column name ‘TransactionType’.
    Msg 207, Level 16, State 1, Procedure SearchHistory_Improved, Line 12
    Invalid column name ‘Quantity’.

    Look at your SET statements. They reference columns of a table within them.

    Even if you could do that, it wouldn’t work as well as the dynamic or recompiled versions because the optimiser can’t sniff the value of variables and tends to make bad guesses about cardinality

    See – http://sqlinthewild.co.za/index.php/2008/05/22/parameter-sniffing-pt-2/

  7. A very good post Gail. Thanks..

  8. How does COALESCE(@Qty, Quantity) = Quantity perform?

  9. Considering that a COALESCE is treated internally as a CASE statement, probably the same as the case statement. I’ll test it and the ISNULL option and post a followup later this week.

  10. I’m confused. In the article it states:

    Note that there’s no SQL injection vulnerability in this. The parameters are never concatenated into the string

    But, looking at the code in the window, they are concatenated. What did I miss?

  11. Gail,

    I’ve done the exact same thing – just a little different.

    SET @sSQL = ‘SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost
    from Production.TransactionHistory
    WHERE 1 = 1 ‘

    IF @Product is null
    SET @P1 = ”
    ELSE
    SET @P1 = ‘AND ProductID = @_Product ‘
    .
    .
    .
    IF @Qty IS NULL
    SET @P4 = ”
    ELSE
    SET @P4 = ‘AND Quantity = @_Qty ‘

    SET @SQL = @sSQL + @P1 + @P2 + @P3 + @P4

    What is the cost of the reassignment of the strings each time in your scenario? Is it negligible?

    Doug

  12. Dan: Where?
    The parameters that were passed to the function were never concatenated into the string. There’s no piece of code that looks like this

    SET @Where = @Where + ‘AND TransactionType =”’ + @_TransactionType + ””

    That’s concatenating the value of the parameter into the string and that’s what makes a piece of dynamic code vulnerable to injection.

    What I have done is set up variables within the SQL string (that’s what the @_Qty and the others are) and then passed them values at execution time using sp_executesql.

  13. Of course indexes don’t work on dynamic “where” clause, whatever you use, COALESCE, CASE, IS NULL, etc. And dynamic “where” clause is not always bad or always good. Just do smart job, check you parameters before run query. If you have some parameters on indexed column run them first and run other on subsequent. This is not some magic; it always depends on every case.

  14. Alex, could you explain in more detail please? I’m not sure I fully understand what you’re suggesting.

  15. Gail,
    Good article – this is a common situation at my current employer, usually going unnoticed until the inappropriate plan results in a 1,000-fold increase in the execution cost.
    The usual defense here is to keeep the original proc as a wrapper for variations of the original, where the variations deal with specific more targetted combinations of meaningful parameters. Of course it’s impractical to do this for every possible combination of (say) 10 possible parameters. It’s usually enough to write purpose-specific procs to avoid the major plan inefficiencies, and have the other (less diabolical) combinations fall back to the catch-all query.

  16. As I sad it is really depend on every case, but usually, on big tables it is better to run as much better plan as you can get first and save result to some where, on temporary table for example, and after run what ever left on that result. I hope my explanation clear, English not my first language.

  17. Alex: Yup, I think I understand you. Use some of the filters first, save to a temp table and then filter further.
    Can you maybe give a short code example (on made up tables, or the Adventureworks tables) as I would like to do a followon post with some timings on various options.

  18. Hi Gail.
    This is an intersting topic and I run some tests myself to see how this works and got puzzled by one thing.

    There is an index on ProductID in the Production.TransactionHistory table.
    However I can not figure out why the server is not using it even if the query is as plain as this one
    SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost
    from Production.TransactionHistory
    WHERE ProductID = 331
    It uses Clustered Index Scan

    But if I create a new index on Quantity and run this
    SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost
    from Production.TransactionHistory
    WHERE Quantity = 100
    I got Index Seek with Key Lookup
    Any ideas what am I missing.

  19. Most likely it’s due to selectivity. If those indexes aren’t covering then there’s a cost to using the NC index as SQL has to do lookups to the cluster for each row. The tipping point is usually about 0.5% of the table.

    If the query needs more than that, SQL’s likely to table scan rather than doing the NC index seek and lots and lots of key lookups.

    I wrote a blog entry a while back on this – http://sqlinthewild.co.za/index.php/2009/01/09/seek-or-scan/

  20. [...] many others, Gail Shaw has blogged on the subject of catch all queries.  On many occasions I have needed to do something similar and found performance to be pretty [...]

  21. Gail, I must be missing something because I’ve tinkered with yoru example and no matter what I try I get errors. I repalced all the charatcers that did not copy over correctly like the single quotes but still it errors out on me.

  22. Gail – Nevermind I figured out my problem. The sample AdventureWorks DB is set to a case senative collation.

  23. I think that may depend on the server collation. AdventureWorks is case insensitive on my server (as are all of my DBs). After a rather nasty experience a few years back I’m allergic to case-sensitive databases ;-)

  24. Thank you so much for this article, it was very informative and extremely helpful. This is by far one of the most useful SQL articles I’ve read over the years!

  25. I think you left out an important section from “Dynamic Search Conditions”. Instead of using this form:

    WHERE (key1 = @key1 OR @key1 IS NULL)
    AND (key2 = @key2 OR @key2 IS NULL)
    AND (key3 = @key3 OR @key3 IS NULL

    you should consider using this:

    WHERE (key1 = @key1 AND @key1 IS NOT NULL)
    OR (key2 = @key2 AND @key2 IS NOT NULL)
    OR (key3 = @key3 AND @key3 IS NOT NULL)

    As Erland Sommarskog says, “The odds are very good that SQL Server will generate a plan which seeks the three key indexes and then merges the result either by index concatenation or some other method. Here is the real big scoop: thanks to the condition, @x IS NOT NULL, SQL Server adds a filter with a startup expression to the plan, so if the corresponding variable is NULL, SQL Server will not access that index at all.”

    I have found in my testing that this solution performs very well; indexes ARE used, performance is great, and dynamic code is not required.

  26. I thought that dynamic SQL was a worse choice than the catch-all option, but this article says otherwise ? I got a big chunk of dynamic sql from a contractor and it’s harder to follow than non-dynamic. What are the drawbacks of dynamic ?

  27. Main drawbacks of dynamic is security. Also the loss of syntax highlighting, compile time error checking and intellisense. It’s not something I’d use everywhere, just for the places it does work well (and this is one of them)

  28. David Walker, your query isn’t logically the same. If 3 keys were passed in, your OR query would pull all rows where any one matches, not rows where all three match.

  29. I guess Walker’s query logic can be modified as such to handle the scenario when two or three parameters are passed.
    WHERE ((key1 = @key1 AND @key1 IS NOT NULL) OR (@key1 IS NULL))
    AND ((key2 = @key2 AND @key2 IS NOT NULL) OR (@key2 IS NULL))
    AND ((key3 = @key3 AND @key3 IS NOT NULL) OR (@key3 IS NULL))

    btw, this logic would be a better approach than the dynamic query..atleast for these types of situatios…

  30. Gail,

    How would you use this methodology when working with a list of keywords aka a keyword search where a hit on any of the keywords should return a record (if there are multiple hits, just return the record once).

  31. Frankly, if I was doing keyword searches I’d probably be using full text search.

  32. Could you also use the WITH RECOMPILE option on the proc if you have SQL 2005 instead of 2008? Would that give the same result as the OPTION (RECOMPILE) query hint as there is only one statement in the proc?

  33. No, you’ll still get the non-optimal plan. In 2005 the optimiser is required to create plans that are safe for reuse, even if they won’t be reused (because of recompile). Only in 2008 SP2 onwards does the recompile work properly – plans optimal for that particular parameter set.

  34. Gail,
    I have, in the past, done extract to temp table and then filter and found significant performance boost. This does depend on what indexes are present and there is at least one mandatory search parameter. So, for example, using AdventureWorks and using mandatory FromDate and ToDate for TransactionDate, the revised SQL would look like

    SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost
    INTO #hist
    FROM Production.TransactionHistory
    WHERE TransactionDate BETWEEN @FromDate AND @ToDate

    IF @Product IS NOT NULL DELETE #hist WHERE ProductID @Product
    IF @OrderID IS NOT NULL DELETE #hist WHERE ReferenceOrderID @OrderID
    IF @TransactionType IS NOT NULL DELETE #hist WHERE TransactionType @TransactionType
    IF @Qty IS NOT NULL DELETE #hist WHERE Quantity @Qty

    SELECT ProductID, ReferenceOrderID, TransactionType, Quantity, TransactionDate, ActualCost
    FROM #hist

  35. So what do you do in your Dynamic query when ‘WHERE TransactionType IS NULL’ is a valid search parameter? Do you have to pass a different value and filter/build the query based on that? If you pass NULL instead of just leaving the parameter out of the EXEC that calls SearchHistory_Dynamic, is there a way to tell?

  36. Maybe another parameter for the proc?

    @TranType INT, @TranTypeIsNull BIT, …

    and then add additional logic when you’re building up the dynamic SQL. Or maybe a special value for the parameter that means ‘column is null’, -1 or MinInt if the column is identity

  37. Or use a non NULL char that is not a valid TransactionType (maybe empty string) to mean ‘ALL’

  38. Dave, I had thought of the non-null other value and that’s a great way to do it for most of the data types, but one of the ones I’m concerned about is a bit column where 1, 0 and NULL are all valid entries.

    Gail, thanks! I hadn’t thought of using 2 variables for the same column, but I think that’ll work perfectly for NULL bit columns.

  39. Stefan. Yeah that is always a problem especially when extended to dates :-(
    For bit columns you could use and translate a tinyint for bit column selection, ie 0=0 1=1 2=NULL 3=ALL
    I have done this before, as long as both ends talk the same translation and data type not a problem although the use of a different data type may be unconfortable.

  40. Hey thanks for these good examples. I’m working in a query where the user enters different parameters and still don’t know which option I’m going to take. I have done dynamic SQL before but I don’t feel very confident about that. Still as you were saying, the performance could be way better than having several conditions in the WHERE clause.

  41. [...] Shaw’s article Catch-all queries  (I love the term [...]

  42. Hi Gail
    Wanted to say thanks (way too late though..) for this really great post.
    Dunno how often I referred people here till now and just did it again.
    Greets
    Flo

  43. Hi
    i want to find the combination of Amounts…
    For Example : i am inputing amount in textbox vb.net, and result should be all the combination of that entered amount in DB.
    textbox.text = 500 then result = lets say
    1. 500
    2. 250, 250 (in 2 rows)
    3. 100,100,100,100,100 (in 5 rows)

    is this possible?
    Someone can help me on this Please..
    My E-mail Id is <removed>

    Regards,
    Dipesh

  44. Please post this somewhere like the MSDN forums. There’s no where near enough information given, and your question has nothing to do with this post.

  45. Hi, i’m new to sql and i have this problem with updating data from another table.

    i want to update table2.message based on the criteria of table1.name. for example, all records named John will be updated with ‘Msg1′ in table 2.message. Im using MS SQL 2000 and below is the scenario.

    <editor: removed>

    do you have suggestion on how to update this in bulk without preselecting all the names?

  46. Lin, best if you post this on a SQL forum, like the ones at MSDN or SQLServerCentral. Your question has nothing to do with the post.

  47. Gail

    just to let you now I’ve referenced this page from something I’ve posted to my site. http://glossopian.co.uk/pmwiki.php?n=Main.TestScriptGenerator
    I hope you enjoy it.

  48. Nice. I’d leave a comment there, but I don’t see any way to do so.

  49. Hi Gail, Niice article, I wish I could follow the dynamic SQL route but am still struggling alomng with SQL 2005. I got around the inefficiency by intoducing pagination to my query. I get 10 (or 20,30 etc) records key fields with all the parameters (the query itself is very convoluted, so not going to even attempt to post it), then get the detail I need in a following query. This and other changes (many of my parameters are multi-select, so used a CLR to split into mem tables), took a query that was taking well over a second (usually about 1600ms) down to <100ms even for the worst culprits.

  50. The dynamic SQL route works fine in SQL 2005. In fact, on 2005 it’s the best option because you can’t use the OPTION(RECOMPILE)

  51. G’morning Gail,

    I just got referred in to this post from somewhere or other, and since I use a variety of NULL conditions, I was wondering how my preferred structure works, since I don’t see it above.

    I often use this structure:

    WHERE (CASE WHEN @ProductID IS NULL THEN 1
    WHEN @ProductID = ProductID THEN 1
    ELSE 0
    END) = 1

    More efficient, same, less efficient, or truly horrible?

    When using multi-value parameters, I try to always bust the passed in string down to a table (variable or temp, depending on my mood), then join to the result.

    One definite takeaway for me will be to use the OPTION(RECOMPILE) in any of my procs where I’m allowing a NULL parameter.

  52. Same (I tested it on a forum post, not here).

    It has the same performance effects as the WHERE (@ProductID IS NULL OR ProductID = @ProductID), however I don’t know if OPTION(RECOMPILE) will help that form, depends whether the optimiser can recognise the construct or not (also make sure you’re on 2008 SP2 or above before trying RECOMPILE)

  53. Yo Gail.

    Is there any reason that you do a check on the @where and then subtract the first AND. Instead of doing a @where with a static ‘WHERE 1 = 1 ‘ then add the clauses with AND’s.

    I generally find it’s a little easier to debug because you are doing less string manipulation.

    Cheers.
    Transact Charlie.

  54. Because I heartily dislike the WHERE 1=1 .. form in the where clause and I would never willingly use or promote code that contains it.

  55. Wonderful article. Thanks for posting. Was exactly the help i needed.

  56. This is very nice – BUT you can not use it in Table Valued Functions (TVF), since it uses an EXEC !!
    Any hint on how to use dynamic WHERE in TVF ?

  57. Nice Article :-)

  58. Gail – How might we modify this to use LIKE and IN? If I modify it as follows:

    IF @firstname is not null
    SET @Where = @Where + ‘AND OrderID LIKE (”@_OrderID%”) ‘

    … PRINT sSQL produces:
    WHERE firstname LIKE (‘@_OrderID%’)

    … which looks and executes correct when run manually. However I receive no results from the stored proc when run. Any ideas?

  59. Apologies – I screwed up the code in my previous post by trying to adapt it to your example. Please ignore and consider this instead:

    Gail – How might we modify this to use LIKE and IN? If I modify it as follows:

    IF @OrderID is not null
    SET @Where = @Where + ‘AND OrderID LIKE (”@_OrderID%”) ‘

    … PRINT sSQL produces:
    WHERE OrderID LIKE (‘@_OrderID%’)

    … which looks correct as far as I can tell. However I receive no results from the stored proc when run. Any ideas?

  60. LIKE @Order + ”%”

  61. […] Most of the theory behind this behaviour can be seen in this article: http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/ […]

  62. Gail, in Erland Sommarskog’s blog that you reference above, he suggests that building the query including the parameterized WHERE clause in client-side code is perfectly acceptable. Any thoughts?

  63. Perfectly valid, just make sure that it is parameterised correctly.

    I showed it as a stored proc because I often encounter the ‘catch-all’ form in existing procs and have to tune it without making any changes to the client app.

  64. Thank you for the great, simple, write up. I chose to use dynamic sql, which I have used before, but not in such a clean way. Another thing I like about doing the parameterized list is that it will often lead to shorter sql statements which is helpful when you have large queries that are approaching the 4000 NVARCHAR limit.

  65. Thank you for a great article! It really helped me. I have a small query though. How much performance advantage, in terms of time, does the dynamic query score against the other methods?

Leave a Reply