Performance

Memory Grant Feedback and data skew

The new adaptive query processing features in SQL Server 2017 are useful for fixing performance problems that were previously very hard to fix. They’re not perfect though, and one of the problems with memory grant feedback in particular is that it’s sensitive to data skew.

Before I get into why, let’s look at what adaptive memory grant does in the first place.

Queries request memory for operations like sorts, hash joins, hash aggregates and a few other operators. This is not TempDB space (ideally), it’s just memory. The amount requested is based on the optimiser’s guesses as to the size of the data that will be hashed/sorted, and that’s based off statistics and parameter values. Hence, there’s a chance for the guess to be wrong, and when it is, we get things like this:

PlanWithSpill

SpillDetails

When spill happen, the intermediate resultsets (or parts of them) do get written to TempDB. And read back. And potentially written and read back again, and maybe a few more times. This can be horribly slow.

Of course, there’s a chance that the estimate will be wrong in the other direction. Too large. It’s not as obviously bad, but it can limit the throughput of the system. Instead of the query running really slowly, it may have to wait before it runs at all, waiting for the memory to be granted. (RESOURCE_SEMAPHORE).

These were really hard problems to fix. There isn’t a query hint to request more or less memory than the estimates would allocate (though you can specify, as a percentage of the resource pool, the max and min memory to be allocated), so fixes had to be creative, typically tricking the optimiser into thinking there were more or fewer rows than there really were, or that the rows were wider (there are some lovely tricks that can be done with CROSS APPLY for example)

Adaptive memory grants don’t do anything to correct the optimiser’s mis-estimates. What they do, is allow the query processor to learn from the mistakes. If a query’s memory grant is significantly over or under what is needed, then a note is made of that, somewhere in memory, and the next time the query runs, the memory grant is adjusted to a value based on what the previous execution needed.

So, if we run the example from above a second time, making absolutely no changes in the process, the spills are gone.

SpillGone

This is great, unless you have a particular pattern in your workload, where one query will sometimes have a small number of rows flowing through it, and sometimes a large number. This is not a problem specific to Memory Grant Feedback. It’s been around for a long time, we call it bad parameter sniffing in many cases.

So let’s try a test of running the same query multiple times, alternating between parameter values that return small row counts and parameter values that return large row counts. The plan is the same in all cases, it’s a reused cached plan, and it’s one that’s not bad for the larger row counts (hash join, hash aggregate), so we don’t have the typical bad parameter sniffing problem, but the memory grant will oscillate, being based upon the previous query’s execution. I’m going to execute the stored procedure 200 times.

And I should mention that this is an extreme case. I specifically constructed a scenario where the memory grant required by one execution would be completely inappropriate for the next one. This is not (I hope) something that would happen in the real world.

I monitored what was happening with Extended Events, with the memory_grant_updated_by_feedback and memory_grant_feedback_loop_disabled events.

The results were kinda as expected.

image

And then something interesting happened. I didn’t clear the cache or anything, this was as the procedures executed in a loop.

image

After 8 executions, each with a memory grant update, both the execution count and the count of updates to the grant reset to 1.

This happened again 8 executions later

image

And again 8 executions later

image

Then, finally, after 32 executions, the update is disabled.

image

The procedure then went on to execute a further 168 times, with the same memory grant each time, equal to the last updated value.

So what can we conclude from this?

Firstly, there seems to be a re-evaluation of the memory grant feedback process every 8 modifications, deciding whether to continue adjusting. Second, it will stop adjusting memory grants at some point, though the conditions aren’t documented and I can’t tell from the test I ran what the conditions are. Since they’re not documented, they will probably change in future CUs/versions without notice.

Once the feedback cycle stops, the last memory grant value is what will be used for that query until its plan is removed from cache, at which point the adjustment cycle starts over from scratch.

If you’re working with a system that has this kind of query, with wide differences in optimal memory grant, I would suggest not relying on memory grant feedback, and changing the code so that the grant needed is more constant. This may require splitting procedures up, optimise hints or other fixes for bad parameter sniffing.

I suggest that because the feedback works great for ‘dialling in’ a good value for needed memory grant, but not for cases where the optimal grant is constantly changing. The 200 executions above took 4 minutes total without memory grant feedback, but 12 minutes with memory grant feedback.

It’s a great solution when the original estimate doesn’t match what the query needs, but it’s sub-optimal for queries with constantly changing memory needs. Procedures with widely changing memory needs should be fixed with other methods, including but not limited to multiple procedures, dynamic SQL, plan forcing, or other query hints.

Revisiting catch-all queries

I originally wrote about catch-all queries early in 2009, just as something that I’d seen several times in client code. It turned into the 3rd most popular post ever on my blog.

A lot’s changed since 2009. When I wrote the original post, most production servers were SQL 2005 or SQL 2000. SQL 2008 had been out less than a year and its fix for catch-all queries, the RECOMPILE hint, didn’t even work properly (it had an incorrect results bug in RTM, was pulled in SP1 and fixed in SP2)

As such, my feelings on how to solve the problem with catch-all queries has changed over the years.

Before I get to solutions, let’s start with the root cause of the problem with catch-all queries – plan caching and the need for plans to be safe for reuse.

Let’s take a sample query. I’ll use the same one I used in the original post.

CREATE PROCEDURE SearchHistory
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost
FROM Production.TransactionHistory
WHERE (ProductID = @Product Or @Product IS NULL)
AND (ReferenceOrderID = @OrderID OR @OrderID Is NULL)
AND (TransactionType = @TransactionType OR @TransactionType Is NULL)
AND (Quantity = @Qty Or @Qty is null)
GO

There are two nonclustered indexes on the TransactionHistory table, one on ProductID, one on ReferenceOrderID and ReferenceLineID.

For the initial discussion, let’s just consider two of the clauses in the WHERE. I’ll leave the other two in the stored proc, but they won’t be used.

WHERE (ProductID = @Product Or @Product IS NULL)
AND (ReferenceOrderID = @OrderID OR @OrderID Is NULL)

We would expect, if the ProductID parameter is passed, to get a seek using the index on ProductID, if the ReferenceOrderID parameter is passed, to get a seek using the index on ReferenceOrderID, and if both are passed, then either an index intersection or a seek on one of the indexes, key lookup and secondary filter for the other, plus, in all cases, a key lookup to fetch the columns for the SELECT.

That’s not what we get (I cleared the plan cache before running each of these).

ProductScan

OrderScan

The expected indexes are used, but they’re used for scans not seeks. Why? Let’s just consider the second plan for a bit.

The index aren’t used for seeks, because plans must be safe for reuse. If a plan was generated with an index seek, seeking for ReferenceOrderID = @OrderID, and that plan was cached and reused later when @OrderID was NULL, we’d get incorrect results. ReferenceOrderID = NULL matches no records.

And so we have index scans with the full predicate (ReferenceOrderID = @OrderID OR @OrderID Is NULL) applied after the index is read.

This is not particularly efficient, as the properties on the index seek shows.

InefficientIndexScan

The entire index, all 113443 rows were read, to return a single row. Not ideal, but it’s far from the largest problem with this form of query.

The plan’s got an index scan on the index on ReferenceOrderID, and then a key lookup back to the clustered index. That key lookup has a secondary filter on it, (ProductID = @Product Or @Product IS NULL). The optimiser assumed that a small number of rows would be returned from the index seek on ReferenceOrderID (1.47 to be specific), and hence the key lookup would be cheap, but that’s not going to be the case if the plan is reused with a ProductID passed to it instead of a ReferenceOrderID.

Before we look at that, the performance characteristics for the procedure being called with the ReferenceOrderID parameter are:

PerformanceOrder

The duration and CPU are both in microseconds, making this a very fast query, despite the index scan.

Now, without clearing the plan cache, I’m going to run the procedure with only the ProductID parameter passed.

PerformanceProduct

CPU’s gone from an average of 8ms to around 120ms. Duration has gone from average around 6ms to about 125ms and reads have jumped from 271 (2 MB of data processed) to 340 597 (2.6 GB of data processed)

And this is for a table that has 113k records and a query that returned 4 rows.

The key lookup, which was fine when an OrderID was passed, is not fine when @OrderID is NULL and the index scan returns the entire table.

ExpensiveIndexScan

ExpensiveKeyLookup

The plans that the optimiser has come up with for this query form aren’t stable. They’re safe for reuse, they have to be, but performance-wise they’re not stable.

But, maybe it’s just this form of query, there are other ways to write queries with multiple optional parameters.

Let’s try the CASE and COALESCE forms.

CREATE PROCEDURE SearchHistory_Coalesce
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost
FROM Production.TransactionHistory
WHERE ProductID = COALESCE(@Product, ProductID)
AND ReferenceOrderID = COALESCE(@OrderID, ReferenceOrderID)
AND TransactionType = COALESCE(@TransactionType, TransactionType)
AND Quantity = COALESCE(@Qty, Quantity)
GO

CREATE PROCEDURE SearchHistory_Case
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost
FROM Production.TransactionHistory
WHERE ProductID = CASE WHEN @Product IS NULL THEN ProductID ELSE @Product END
AND ReferenceOrderID = CASE WHEN @OrderID IS NULL THEN ReferenceOrderID ELSE @OrderID END
AND TransactionType = CASE WHEN @TransactionType IS NULL THEN TransactionType ELSE @TransactionType END
AND Quantity = CASE WHEN @Qty IS NULL THEN Quantity ELSE @Qty END
GO

Coalesce

Case

These both give us full table scans, rather than the index scan/key lookup we saw earlier. That means their performance will be predictable and consistent no matter what parameter values are used. Consistently bad, but at least consistent.

It’s also worth noting that neither of these will return correct results if there are NULL values in the columns used in the WHERE clause (because NULL != NULL). Thanks to Hugo Kornelis (b | t) for pointing this out.

And then two more forms that were mentioned in comments on the original post, slightly more complicated:

CREATE PROCEDURE SearchHistory_Case2
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT  ProductID,
ReferenceOrderID,
TransactionType,
Quantity,
TransactionDate,
ActualCost
FROM    Production.TransactionHistory
WHERE   (CASE WHEN @Product IS NULL THEN 1
WHEN @Product = ProductID THEN 1
ELSE 0
END) = 1
AND (CASE WHEN @OrderID IS NULL THEN 1
WHEN @OrderID = ReferenceOrderID THEN 1
ELSE 0
END) = 1
AND (CASE WHEN @TransactionType IS NULL THEN 1
WHEN @TransactionType = TransactionType THEN 1
ELSE 0
END) = 1
AND (CASE WHEN @Qty IS NULL THEN 1
WHEN @Qty = Quantity THEN 1
ELSE 0
END) = 1
GO

CREATE PROCEDURE SearchHistory_Complex
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT  ProductID,
ReferenceOrderID,
TransactionType,
Quantity,
TransactionDate,
ActualCost
FROM    Production.TransactionHistory
WHERE ((ProductID = @Product AND @Product IS NOT NULL) OR (@Product IS NULL))
AND ((ReferenceOrderID = @OrderID AND @OrderID IS NOT NULL) OR (@OrderID IS NULL))
AND ((TransactionType = @TransactionType AND @TransactionType IS NOT NULL) OR (@TransactionType IS NULL))
AND ((Quantity = @Qty AND @Qty IS NOT NULL) OR (@Qty IS NULL))

These two give the same execution plans as the first form we looked at, index scan and key lookup.

Performance-wise, we’re got two different categories of query. We’ve got some queries where the execution plan contains an index scan on one or other index on the table (depending on parameters passed) and a key lookup, and others where the execution plan contains a table scan (clustered index scan) no matter what parameters are passed.

But how do they perform? To test that, I’m going to start with an empty plan cache and run each query form 10 times with just the OrderID being passed and then 10 times with just the ProductID being passed, and aggregate the results.

Procedure Parameter CPU (ms) Duration (ms) Reads
SearchHistory OrderID 5.2 50 271
ProductID 123 173 340597
SearchHistory_Coalesce OrderID 7.8 43 805
ProductID 9.4 45 805
SearchHistory_Case OrderID 12.5 55 805
ProductID 7.8 60 804
SearchHistory_Case2 OrderID 10.5 48 272
ProductID 128 163 340597
SearchHistory_Complex OrderID 7.8 40 272
ProductID 127 173 340597

 

The query forms that had the clustered index scan in the plan have consistent performance. On large tables it will be consistently bad, it is a full table scan, but it will at least be consistent.

The query form that had the key lookup have erratic performance, no real surprise there, key lookups don’t scale well and looking up every single row in the table is going to hurt. And note that if I ran the queries in the reverse order on an empty plan cache, the queries with the ProductID passed would be fast and the queries with the OrderID would be slow.

So how do we fix this?

When I first wrote about this problem 7 years ago, I recommended using dynamic SQL and discussed the dynamic SQL solution in detail. The dynamic SQL solution still works very well, it’s not my preferred solution any longer however.

What is, is the RECOMPILE hint.

Yes, it does cause increased CPU usage due to the recompiles (and I know I’m likely to get called irresponsible and worse for recommending it), but in *most* cases that won’t be a huge problem. And if it is, use dynamic SQL.

I recommend considering the RECOMPILE hint first because it’s faster to implement and far easier to read. Dynamic SQL is harder to debug because of the lack of syntax highlighting and the increased complexity of the code. In the last 4 years, I’ve only had one case where I went for the dynamic SQL solution for a catch-all query, and that was on a server that was already high on CPU, with a query that ran many times a second.

From SQL 2008 SP2/SQL 2008 R2 onwards, the recompile hint relaxes the requirement that the generated plan be safe for reuse, since it’s never going to be reused. This firstly means that the plans generated for the queries can be the optimal forms, index seeks rather than index scans, and secondly will be optimal for the parameter values passed.

CatchallIndexSeek

And performance-wise?

RecompilePerformance

Reads down, duration down and CPU down even though we’re recompiling the plan on every execution (though this is quite a simple query, so we shouldn’t expect a lot of CPU to generate the plan).

How about the other forms, do they also improve with the RECOMPILE hint added? As I did before, I’m going to run each 10 times and aggregate the results, that after adding the RECOMPILE hint to each.

Procedure Parameter CPU (ms) Duration (ms) Reads
SearchHistory OrderID 0 1.3 28
ProductID 0 1.2 19
SearchHistory_Coalesce OrderID 6.2 1.2 28
ProductID 3.2 1.2 19
SearchHistory_Case OrderID 1.6 1.3 28
ProductID 0 1.2 19
SearchHistory_Case2 OrderID 7.8 15.6 232
ProductID 7.8 11.7 279
SearchHistory_Complex OrderID 1.5 1.4 28
ProductID 0 1.2 19

 

What can we conclude from that?

One thing we note is that the second form of case statement has a higher CPU, duration and reads than any other. If we look at the plan, it’s still running as an index scan/key lookup, despite the recompile hint.

The second thing is that the more complex forms perform much the same as the simpler forms, we don’t gain anything by adding more complex predicates to ‘guide’ the optimiser.

Third, the coalesce form might use slightly more CPU than the other forms, but I’d need to test a lot more to say that conclusively. The numbers we’ve got are small enough that there might well be measuring errors comparable to the number itself.

Hence, when this query form is needed, stick to the simpler forms of the query, avoid adding unnecessary predicates to ‘help’ the optimiser. Test the query with NULLs in the filtered columns, make sure it works as intended.

Consider the RECOMPILE hint first, over dynamic SQL, to make it perform well. If the query has long compile times or runs very frequently, then use dynamic SQL, but don’t automatically discount the recompile hint for fear of the overhead. In many cases it’s not that bad.

What is a SARGable predicate?

‘SARGable’ is a weird term. It gets bandied around a lot when talking about indexes and whether queries can seek on indexes. The term’s an abbreviation, ‘SARG’ stands for Search ARGument, and it means that the predicate can be executed using an index seek.

Lovely. So a predicate must be SARGable to be able to use an index seek, and it must be able to use an index seek to be SARGable. A completely circular definition.

So what does it actually mean for a predicate to be SARGable? (and we’ll assume for this discussion that there are suitable indexes available)

The most general form for a predicate is <expression> <operator> <expression>. To be SARGable, a predicate must, on one side, have a column, not an expression on a column. So, <column> <operator> <expression>

SELECT * FROM Numbers
WHERE Number = 42;

Seek1

SELECT * FROM Numbers
WHERE Number + 0 = 42;

Scan1

SELECT * FROM Numbers
WHERE Number = 42 + 0;

Seek2

Any1 function on a column will prevent an index seek from happening, even if the function would not change the column’s value or the way the operator is applied, as seen in the above case. Zero added to an integer doesn’t change the value of the column, but is still sufficient to prevent an index seek operation from happening.

While I haven’t yet found any production code where the predicate is of the form ‘Column + 0’ = @Value’, I have seen many cases where there are less obvious cases of functions on columns that do nothing other than to prevent index seeks.

UPPER(Column) = UPPER(@Variable) in a case-insensitive database is one of them, RTRIM(COLUMN) = @Variable is another. SQL ignores trailing spaces when comparing strings.

The other requirement for a predicate to be SARGable, for SQL Server at least, is that the column and expression are of the same data type or, if the data types differ, such that the expression will be implicitly converted to the data type of the column.

SELECT 1 FROM SomeTable
WHERE StringColumn = 0;

Scan2

SELECT 1 FROM SomeTable
WHERE StringColumn = ‘0’;

Seek3

There are some exceptions here. Comparing a DATE column to a DATETIME value would normally implicitly convert the column to DATETIME (more precise data type), but that doesn’t cause index scans. Neither does comparing an ascii column to a unicode string, at least in some collations.

In generally though, conversions should be explicit and decided on by the developer, not left up to what SQL server decides.

What about operators?

The majority are fine. Equality, Inequality, IN (with a list of values), IS NULL all allow index usage. EXIST and IN with a subquery are treated like joins, which may or may not use indexes depending on the join type chosen.

LIKE is a slight special case. Predicates with LIKE are only SARGable if the wildcard is not at the start of the string.

SELECT 1 FROM SomeStrings
WHERE ASCIIString LIKE 'A%'

Seek4

SELECT 1 FROM SomeStrings
WHERE ASCIIString LIKE '%A'

Scan3

There are blog posts that claim that adding NOT makes a predicate non-SARGable. In the general case that’s not true.

SELECT * FROM Numbers
WHERE NOT Number > 100;

Seek5

SELECT * FROM Numbers
WHERE NOT Number <= 100;

Seek6

SELECT * FROM Numbers
WHERE NOT Number = 137;

Seek7

These index seeks are returning most of the table, but there’s nothing in the definition of ‘SARGable’ that requires small portions of the table to be returned.

That’s mostly that for SARGable in SQL Server. It’s mostly about having no functions on the column and no implicit conversions of the column.

(1) An explicit CAST of a DATE column to DATETIME still leaves the predicate SARGable. This is an exception that’s been specifically coded into the optimiser.

Monitoring wait stats

This post, like last week’s, is based off the presentation I did to the DBA Fundamentals virtual chapter.

The request was for more details on the method I use to capture wait and file stats on servers, The methods are pretty similar, so I’ll show waits.

This is by no means the only way of doing it, it’s the way I do it.

Part the First: Capture job

This is the easy part. Into a job step goes the following:

INSERT  INTO Performance.dbo.WaitStats
SELECT  wait_type as WaitType,
        waiting_tasks_count AS NumberOfWaits,
        signal_wait_time_ms AS SignalWaitTime,
        wait_time_ms - signal_wait_time_ms AS ResourceWaitTime,
        GETDATE() AS SampleTime
FROM    sys.dm_os_wait_stats
WHERE   wait_time_ms > 0
    AND wait_type NOT IN (<list of waits to ignore>);

Schedule the job to run on an interval for a couple of days. I like to run it every 15 min, maybe every half an hour. I’m trying to get overall behaviour, not identify queries. If I need later to see what queries incur a particular wait, I can use an extended event session.

For the list of waits to ignore, I use Glenn’s list, the latest version found at http://www.sqlskills.com/blogs/glenn/sql-server-diagnostic-information-queries-detailed-day-14/

I run this no less than a day, preferably a week if I can. 2-3 days is normally what I get.

Part the Second: Analysis script

The analysis script does two things:

  • Get the wait times within an interval
  • Pivot them so that I can easily graph in excel

To see which waits I want to include in the pivot, I look at the 20 waits with the highest increase in the interval monitored (this requires that the server wasn’t restarted during it).

I’m not necessarily going to graph and analyse all of them, but it does help ensure I don’t miss something interesting (like, for example, high LCK_M_Sch_S locks every day between 08:00 and 08:45)

For the purposes of this post, let’s say the ones I’m interested in for a particular analysis are LCK_M_IX, PAGELATCH_EX, LATCH_EX and IO_COMPLETION.

To be clear, those are for this example only. Do Not copy the below code and run without specifying the waits you’re interested in looking at, or the results are going to be less than useless.

The first thing I want to do is add a Row_Number based on the times the wait stats were recorded, so that I can join and take the difference between one interval and the next. In theory it should be possible to do this with times, but the insert doesn’t occur at exactly the same time, to the millisecond, each interval, hence this would require fancy date manipulation. Easier to use a ROW_NUMBER

SELECT  WaitType,
        NumberOfWaits,
        SignalWaitTime,
        ResourceWaitTime,
        SampleTime,
        ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval
FROM    dbo.WaitStats
WHERE   WaitType IN ('LCK_M_IX', ‘PAGELATCH_EX’, 'LATCH_EX', 'IO_COMPLETION');

Next step, turn that into a CTE, join the CTE to itself with an offset and take the difference of the waiting tasks, the signal wait time and the resource wait time.

WITH    RawWaits
          AS (SELECT    WaitType,
                        NumberOfWaits,
                        SignalWaitTime,
                        ResourceWaitTime,
                        SampleTime,
                        ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval
              FROM      dbo.WaitStats
              WHERE     WaitType IN ('LCK_M_IX', ‘PAGELATCH_EX’, 'LATCH_EX', 'IO_COMPLETION')
             )
    SELECT  w1.SampleTime,
            w1.WaitType AS WaitType,
            w2.NumberOfWaits - w1.NumberOfWaits AS NumerOfWaitsInInterval,
            w2.ResourceWaitTime - w1.ResourceWaitTime AS WaitTimeInInterval,
            w2.SignalWaitTime - w1.SignalWaitTime AS SignalWaitTimeInInterval
    FROM    RawWaits w1
            LEFT OUTER JOIN RawWaits w2 ON w2.WaitType = w1.WaitType
                                           AND w2.Interval= w1.Interval + 1;

Last step, pivot the results. This will pivot and show the resource wait. Change the column that’s in the select and the pivot to show the others. It doesn’t matter what aggregation function is used because there’s only one value in each interval, so sum, avg, min and max will all give the same result (just, don’t use count)

WITH    RawWaits
          AS (SELECT    WaitType,
                        NumberOfWaits,
                        SignalWaitTime,
                        ResourceWaitTime,
                        SampleTime,
                        ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval
              FROM      dbo.WaitStats
              WHERE     WaitType IN ('LCK_M_IX', 'PAGELATCH_EX', 'LATCH_EX', 'IO_COMPLETION')
             ),
        WaitIntervals
          AS (SELECT    w1.SampleTime,
                        w1.WaitType AS WaitType,
                        w2.NumberOfWaits - w1.NumberOfWaits AS NumerOfWaitsInInterval,
                        w2.ResourceWaitTime - w1.ResourceWaitTime AS WaitTimeInInterval,
                        w2.SignalWaitTime - w1.SignalWaitTime AS SignalWaitTimeInInterval
              FROM      RawWaits w1
                        LEFT OUTER JOIN RawWaits w2 ON w2.WaitType = w1.WaitType
                                                       AND w2.Interval = w1.Interval + 1
             )
    SELECT  *
    FROM    (SELECT SampleTime, WaitType, WaitTimeInInterval FROM WaitIntervals
            ) p PIVOT ( AVG(WaitTimeInInterval) FOR WaitType IN ([LCK_M_IX], [PAGELATCH_EX], [LATCH_EX], [IO_COMPLETION]) ) AS pvt
    ORDER BY SampleTime;

And there we have a result that can easily be imported into excel (or R) and graphed or analysed further.

and other pointless query rewrites

In a similar vein to last week’s blog post… I heard an interesting comment recently. “Change that Column != 2 to a Column > 2 or Column < 2 combination, it can use indexes better.”

Sounds like something that clearly needs testing!

I’ll start with simple numbers table.

CREATE TABLE dbo.Numbers (
  Number INT NOT NULL
);

ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers
  PRIMARY KEY CLUSTERED (Number);

and put 1 million rows into it

INSERT INTO dbo.Numbers (Number)
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))
  FROM msdb.sys.columns c CROSS JOIN msdb.sys.columns c2;

Let’s start, before we get into comparing things, with looking at the execution plan of a query with a != (or <>) operator.

SELECT Number
  FROM Numbers
  WHERE Number <> 12742; -- because 2 is on the first page of the index, and I don’t want any special cases here

InequalityExecutionPlan

That’s kinda complicated for a query with one table and one predicate. Let’s look at in pieces. The easiest place to start is the Clustered Index Seek. The seek predicate on the clustered index seek is

Seek Keys[1]: Start: [Test].[dbo].[Numbers].Number > Scalar Operator([Expr1009]), End: [Test].[dbo].[Numbers].Number < Scalar Operator([Expr1010])

Hmm…Looks like the parser/optimiser has already made our intended change for us. There’s some funky stuff in the top part of the plan, but what it’s essentially doing is generating two rows for the nested loop join, both with just the value that we’re excluding from the query, then the seek runs twice. I suspect that’s once for the less than 12742 and once for the greater than 12742 portions of the original predicate.

But, let’s do the full due diligence, the plan may not tell the whole story.

The performance numbers for the inequality form of the query, gathered via Extended Events and aggregated with Excel are:

Duration 122ms
CPU 105ms
Logical reads: 1619

This is our baseline, the numbers we’re comparing against. If the comment mentioned at the beginning is correct, then the revised query will have a significantly better performance.

The revised query is:

SELECT Number
  FROM Numbers
  WHERE Number > 12742 OR Number < 12742;

Execution plan is much simpler, no constant scans, no joins. Just a single index seek operation that executes once.

InequalityRevised

Is is better though?

Duration: 126ms
CPU: 103ms
Logical reads: 1619

No, it’s not.

Yes, we have a simpler plan, but we do not have a more efficient query. We’re still reading every page in the index and fetching all but one row of the table. The work required is the same, the performance characteristics are the same.

But, maybe, if the numbers aren’t unique and we’re excluding more than just one row it’ll be different.

That needs a slightly different table to test on.

CREATE TABLE MoreNumbers (
SomeNumber INT NOT NULL,
Filler CHAR(100)
);

CREATE CLUSTERED INDEX idx_MoreNumbers ON dbo.MoreNumbers(SomeNumber);
GO

INSERT INTO dbo.MoreNumbers (SomeNumber, Filler)
SELECT TOP (500000) NTILE(50) OVER (ORDER BY (SELECT 1)), ''
FROM msdb.sys.columns c CROSS JOIN msdb.sys.columns c2;

I’m just going to look at the performance characteristics this time. The execution plans are the same as for the earlier query. The two queries are:

SELECT * FROM dbo.MoreNumbers WHERE SomeNumber != 24;
SELECT * FROM dbo.MoreNumbers WHERE SomeNumber < 24 OR SomeNumber > 24;

Query 1:

Duration 97ms
CPU 77ms
Logical Reads 7624

Query 2:

Duration 98ms
CPU 75ms
Logical Reads 7624

Again identical.

Just like with the pointless WHERE clause predicate last week, we have a query change that has had no effect on the query performance. Now, to be honest, there are some query form changes that can improve performance. For example, converting a set of OR predicates to UNION can improve query performance sometimes (and leave it unchanged in others), and so these kinds of rewrites do need to be tested to see if they’re useful.

More importantly though, those of us who are posting on forums and advising others have a responsibility to do these tests before we recommend changes to others, as they may very well not do them. If we don’t, we’re propagating myths and not helping the advancement of our field.

On the addition of useless where clauses

I remember a forum thread from a while back. The question was on how to get rid of the index scan that was in the query plan. Now that’s a poor question in the first place, as the scan might not be a problem, but it’s the first answer that really caught my attention.

Since the primary key is on an identity column, you can add a clause like ID > 0 to the query, then SQL will use an index seek.

Technically that’s correct. If the table has an identity column with the default properties (We’ll call it ID) and the clustered index is on that identity column, then a WHERE clause of the form WHERE ID > 0 AND <any other predicates on that table> can indeed execute with a clustered index seek (although it’s in no way guaranteed to do so). But is it a useful thing to do?

Time for a made up table and a test query.

CREATE TABLE dbo.Orders(
  OrderID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
  OrderDate DATETIME2(7) NULL,
  ShipmentRef CHAR(10) NULL,
  ShipmentDate DATE NULL,
  Status VARCHAR(20) NOT NULL
);

That’ll do the job. And then a few hundred thousand rows via SQL Data Generator and we’re good to go.

And for a query that has a nasty index scan, how about

SELECT  OrderDate,
    ShipmentDate,
    Status
  FROM    dbo.Orders
  WHERE   LTRIM(RTRIM(Status)) = 'Delivered';

ClusteredIndexScan

Now, that’s running as a clustered index scan because the predicate’s not SARGable and besides, there’s no index on that column, but let’s pretend we don’t know that.

If I add a WHERE clause predicate that filters no row out, can I get a query plan with an index seek?

SELECT  OrderDate,
    ShipmentDate,
    Status
  FROM    dbo.Orders
  WHERE   LTRIM(RTRIM(Status)) = 'Delivered'
    AND OrderID > 0;

Why yes, I can.

ClusteredIndexSeek

Op Success? Well…

The goal of performance tuning is to improve the performance of a query, not to change operators in a query plan. The plan is a tool, not a goal.

Have we, by adding a WHERE clause predicate that filters out no rows, improved performance of the query? This needs an extended events session to answer. Nothing fancy, just a sql_statement_completed event will do the trick.

I ran each query 10 times, copied the captured events into Excel and averaged them:

Query with just the LTRIM(RTRIM(Status)) = ‘Delivered’
CPU: 77ms
Duration: 543ms

Query with LTRIM(RTRIM(Status)) = ‘Delivered’ AND OrderID > 0
CPU: 80ms
Duration: 550ms

We haven’t tuned that query. I won’t say we’ve made it slower either, the differences are well within the error range on our measuring, but there’s definitely no meaningful performance gain.

There’s no gain because we haven’t changed how the query executes. A scan, and in this case it will be a scan of the entire index, will likely use the non-leaf levels of the b-tree to locate the logical first page of the leaf level, then will read the entire leaf level. The seek we managed to generate will use the b-tree to find the value 0 in the clustered index key, that’s what makes it a seek. Since the column is an identity starting at 1, that means the first row read will be on the logical first page of the leaf level, then it will read the entire leaf level.

Both will do the same amount of work, and so we haven’t done anything useful to the query by adding a WHERE clause that filters out no rows.

Scans are not always bad. If a query needs to read every row of a table, that’s a scan and effort shouldn’t be expended trying to make it an index seek.

To improve the performance of a query, we need to make changes that reduce the work needed to run the query. That often starts with reducing the amount of data that the query reads, by changing the query so that it can use indexes effectively and/or adding indexes to support the query. Not by adding pointless pieces to a query just to change plan operators from ones that are believed to be bad to ones that are believed to be good. Doing that is just a waste of time and effort.

Natively compiled user-defined functions

One new thing that SQL Server 2016 has added is the ability to natively compile user-defined functions. Previously, native compilation, part of the hekaton feature, was limited to stored procedures.

When I saw that, the first question that came to mind is whether natively compiling a scalar function reduces the overhead when calling that function within another query. I’m not talking about data-accessing scalar UDFs, since natively compiled functions can only access in-memory tables, but functions that do simple manipulation of the parameters passed in. String formatting, for example, or date manipulation.

While not as harmful as data-accessing scalar UDFs, there’s still overhead as these are not inline functions, they’re called for each row in the resultset (as a look at the Stored Procedure Completed XE event would show), and the call to the function takes time. Admittedly not a lot of time, but when it’s on each row of a large resultset the total can be noticeable.

I decided to do a simple test. A query against a table with ~600k rows, one query with a traditional scalar function, one with a natively compiled function and one with the function’s contents in the query as a column.

The two functions:

CREATE FUNCTION dbo.DateOnly (@Input DATETIME)
RETURNS DATETIME
AS
BEGIN
  RETURN DATEADD(dd, DATEDIFF (dd, 0, @Input), 0);
END
GO

and

CREATE FUNCTION dbo.DateOnlyHekaton (@Input DATETIME)
RETURNS DATETIME
WITH NATIVE_COMPILATION, SCHEMABINDING
AS
BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'English')
  RETURN DATEADD(dd,DATEDIFF(dd,0,@Input),0);
END
GO

And the query without the function would be a simple

SELECT DATEADD(dd, DATEDIFF(dd, 0, SomeDate), 0)
FROM SomeTable;

Durations and CPU usage were caught with Extended Events. I ran each query 25 times and aggregated the results.

Average CPU (ms) Average Duration (ms)
In-line expression 289 294
Normal function 3555 3814
Natively Compiled Function 3318 3352

Not quite what I hoped. While the overhead of the natively compiled function is lower, it’s lower only by about 10%, which really is not worth it, now when we’re talking about an order of magnitude difference from the query without the function call.

Looks like the guidance is still going to be to not use scalar UDFs within other queries.

Q&A from the DBA Fundamentals Virtual Chapter

A couple of weeks ago I did a presentation to the DBA Fundamentals virtual chapter. The presentation title was “What execution plans can tell you about query performance”

The slides and recording are available at the Virtual Chapter’s home page

I didn’t manage to get all of the questions answered, so here are a couple of slightly more involved questions which didn’t get answered.

Does the order of table matter when doing an inner join?

Short answer: No.

Long answer: Maybe, but it shouldn’t.

The optimiser decides which table is joined in which order. Putting a table first in the join clause does not mean it will be the first one processed. In general (as in, in ~99% of cases), put the tables in the join clause in the order which makes logical sense for the query.

Changing table order can, in some cases, change the plan. This doesn’t mean that SQL uses the order which the tables are specified in to determine the plan, it just means that changing the query resulted in the optimiser searching through the plan search space in a different way and finding a different ‘good enough’ plan. It’s not going to be deterministic and hence shouldn’t be relied on.

Will moving a filter from the WHERE to the INNER JOIN improve performance?

No, but again it can change the plan generated as described above. Personally I prefer joins in the JOIN clause and filters in the WHERE clause, because that’s what’s normal and expected.

Please note that moving filters from/to the WHERE clause from an OUTER JOIN changes the logic of the query and likely the results.

If multiple users are running the same query with different parameter values, will it result in different plans or recompiles?

Neither.

There will be one plan in cache (unless the SET options differ, but let’s ignore that for now). No matter what the parameter values are, when the same query is run, the plan will be fetched from cache and used.

Does index fragmentation have an effect on the join type chosen?

The Query Optimiser has no idea what logical fragmentation is. It doesn’t base its choices on how the pages are laid out in the data file. Logical fragmentation affects large range scans from disk, that’s all. If the pages are in memory, then fragmentation has no further effect.

Converting OR to Union

When I looked at indexing for queries containing predicates combined with OR, it became clear that the are some restrictive requirements for indexes for the optimiser to consider using the indexes for seek operations.

  • Each predicate (or set of predicates) combined with an OR must have a separate index
  • All of those indexes must be covering, or the row count of the concatenated result set low enough to make key lookups an option, as the optimiser does not apparent to consider the possibility of doing key lookups for a subset of the predicates before concatenating the result sets.

So what can be done if it’s not possible to meet those requirements?

The standard trick is to convert the query with ORs into multiple queries combined with UNION. The idea is that since OR predicates are evaluated separately and the result sets concatenated, we can do that manually by writing the queries separately and concatenating them using UNION or UNION ALL. (UNION ALL can only be safely used if the predicates are known to be mutually exclusive)

CREATE TABLE Persons (
PersonID INT IDENTITY PRIMARY KEY,
FirstName    VARCHAR(30),
Surname VARCHAR(30),
Country CHAR(3),
RegistrationDate DATE
)

CREATE INDEX idx_Persons_FirstName ON dbo.Persons (FirstName) INCLUDE (Surname)
CREATE INDEX idx_Persons_Surname ON dbo.Persons (Surname) INCLUDE (FirstName)
GO

-- Data population using SQLDataGenerator

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Daniel' OR Surname = 'Barnes'

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Daniel'
UNION
SELECT FirstName, Surname
FROM dbo.Persons
WHERE Surname = 'Barnes'

In this case, the OR can be replaced with a UNION and the results are the same. The Union form is slightly less efficient according to the execution plan’s costings (60% compared to the OR at 40%), and the two queries have the same general form, with two index seeks and some form of concatenation and remove duplicates.

OrResult1
OrExecPlan1

So in that case it worked fine, although the original form was a little more efficient
(more…)

To TOP or not to TOP an EXISTS

Earlier this year I had a look at a query pattern that I often see on forums and in production code, that of the Distinct within an IN subquery. Today I’m going to look at a similar patters, that being the use of TOP 1 within an EXISTS subquery.

Three tests. First a straightforward exists with no correlation (no where clause linking it to an outer query). Second, an exists with a complex query (one with a non-sargable where clause and a group by and having). Third an exists subquery correlated to the outer query.

Table structures are nice and simple, in fact, for ease I’m going to use the same tables as I did back on the exists, in and inner join tests. Code to create and populate the tables it attached to the end of the post.

First up, a simple exists query, in an IF, just to be different.

IF EXISTS (SELECT 1 FROM PrimaryTable_Medium)
PRINT 'Exists'

IF EXISTS (SELECT TOP (1) 1 FROM PrimaryTable_Medium)
PRINT 'Exists too'

For a benchmark, a SELECT 1 FROM PrimaryTable_Medium has the following IO characteristics

Table ‘PrimaryTable_Medium’. Scan count 1, logical reads 89, physical reads 0.

SQL Server Execution Times:
CPU time = 15 ms,  elapsed time = 510 ms.

Ignore the elapsed time, that’s likely mostly from displaying the records. I’m going to focus mostly on the CPU and IO.

Execution plans of the two exists variations are absolutely identical.

TopExists1

The index operators are scans because there is no way they could be anything else, there’s no predicate so a seek is not possible. That said, it’s not a full index scan. The properties of the Index Scan show 1 row only (actual and estimated). So SQL did not read the entire index, just enough to evaluate the EXISTS, and that’s what it did in both cases. IO stats confirm that.

(more…)