Latest Posts

Blocking operators and actual row counts

Query plans can sometimes be hard to read, and other times can be downright mystifying.

Take this plan for example. Not too hard in general. Two index seek/scan, a join, a sort and a filter. The peculiarity here is in the actual row counts. We expect that a join can filter rows out, that a filter can, well, filter rows out, that a top can reduce rows, that any aggregation can reduce the row count.

SortActualRowsBefore

SortActualRowsAfter

But why is a sort operator, a normal sort, reducing the row count? The answer lies in part not in how rows flow through the query plan, but in how control flows through the plan, and in part in the types of operators in the plan.

First let’s look at the types of operators. Here I don’t mean joins and aggregates and the like, I’m referring to whether an operator is a blocking operator or a non-blocking operator.

A non-blocking operator is one that consumes and produces rows at the same time. Nested loop joins are non-blocking operators.

A blocking operator is one that requires that all rows from the input have been consumed before a single row can be produced. Sorts are blocking operators.

Some operators can be somewhere between the two, requiring a group of rows to be consumed before an output row can be produced. Stream aggregates are an example here.

The sort in the plan is a blocking operator, and hence it needs all rows from the operator before it, the loop, before it can output any rows. That’s the 2920 going in to it, but why is there only 50 rows coming out?

That’s down to the way a query executes. Starting at the top of the plan, the top operator, in this case a SELECT asks the operator beneath it for a row. If the requested operator isn’t one that can generate a row (eg an index scan), then it asks the operator beneath it for a row.

The query that generated the shown plan had a filter based on the generated Row_Number of RowNumber between 26 and 50. This filter was executed by the Filter operator and partially by the Top operator.

RestOfPlan

FilterPropertiesTopProperties[3]

The TOP is there because the filter is on a Row_Number, the resultset is sorted by the columns defined in the Row_Number’s order by and there’s no partition by. The row numbered 50 will be the 50th row in the resultset and after that point there can be no more rows that satisfy the predicate. The query processor knows this.

So, the first row is requested by the select. The Filter can’t generate a row so it asks the Top for a row, and so on down the plan until we get to the sort.

The sort can’t request one row from the operator below it, it’s a blocking operator, it has to fetch all the rows from the operator below it. All 2920 of them.

Once the sort has all the rows, it sorts them and returns one row back to the previous operator. Repeat for the next row and the next.

Let’s fast-forward a few rows. The filter has just returned row 50 to the select operator. Select asks for the next row, row 51. The filter asks the top for the next row. The top, however, knows that it was only supposed to return the first 50 rows, and so instead it tells the filter operator that there are no more rows. The filter passes that up to the select and the query end there.

Hence why we have a sort further down the plan that only outputted 50 rows. Not because it filtered the rows itself, but because it was a blocking operator and the operators above it only asked for 50 rows.

It’s important to be able to read the execution plan in both directions. Reading the plan right-to-left is reading it in the direction of the data flow. Reading it left-to-right is reading it in the direction of the control flow. To fully understand plans it’s necessary to be able to do both.

Obsessing over query operator costs

A common problem when looking at execution plans is attributing too much meaning and value of the costs of operators.

The percentages shown for operators in query plans are based on costs generated by the query optimiser. They are not times, they are not CPU usage, they are not IO. The costs that the query optimiser generates are unit-less numbers that it uses internally to estimate the relative expense of plans as it optimises a query.

For starters, the percentages of operators should add to 100 across the entire plan, so worrying that the only data access operator in a simple query plan shows 100% is useless. If there’s only one index seek/scan in the plan, of course it’s going to be close to 100% of the total cost of the plan, there’s no where else for the cost to go.

But that’s not all. The accuracy of these estimates is based, in part, on the accuracy of the row estimations. If the statistics are out of date or there are any other row estimation errors, then the costs and as a result the percentages shown will be wildly incorrect

Take, for example, this query plan.

ExampleExecutionPlan

According to the percentages, the two operators were of equal cost. But a key lookup is just a single-row clustered index seek by a different name, and if we look at the number of executions of the two operators, it’s clear that they cannot possibly have been the same cost to execute

IndexSeek  KeyLookup

An index seek to return 1 million rows and a million index seeks to fetch one row each are not going to take the same amount of resources to execute, and hence those percentages are completely misleading.

Because the percentages can easily be way out, focusing on them when performance tuning is potentially going to result in a lot of wasted time. There is no single value, counter, measure or result that’s going to by itself indicate the cause of performance problems. Obsessing over single data points, or focusing on changing a single data point is almost certainly going to waste time.

Oh, and if anyone still wants to attribute importance to the percentages…

CostPercentages

Monitoring wait stats

This post, like last week’s, is based off the presentation I did to the DBA Fundamentals virtual chapter.

The request was for more details on the method I use to capture wait and file stats on servers, The methods are pretty similar, so I’ll show waits.

This is by no means the only way of doing it, it’s the way I do it.

Part the First: Capture job

This is the easy part. Into a job step goes the following:

INSERT  INTO Performance.dbo.WaitStats
SELECT  wait_type as WaitType,
        waiting_tasks_count AS NumberOfWaits,
        signal_wait_time_ms AS SignalWaitTime,
        wait_time_ms - signal_wait_time_ms AS ResourceWaitTime,
        GETDATE() AS SampleTime
FROM    sys.dm_os_wait_stats
WHERE   wait_time_ms > 0
    AND wait_type NOT IN (<list of waits to ignore>);

Schedule the job to run on an interval for a couple of days. I like to run it every 15 min, maybe every half an hour. I’m trying to get overall behaviour, not identify queries. If I need later to see what queries incur a particular wait, I can use an extended event session.

For the list of waits to ignore, I use Glenn’s list, the latest version found at http://www.sqlskills.com/blogs/glenn/sql-server-diagnostic-information-queries-detailed-day-14/

I run this no less than a day, preferably a week if I can. 2-3 days is normally what I get.

Part the Second: Analysis script

The analysis script does two things:

  • Get the wait times within an interval
  • Pivot them so that I can easily graph in excel

To see which waits I want to include in the pivot, I look at the 20 waits with the highest increase in the interval monitored (this requires that the server wasn’t restarted during it).

I’m not necessarily going to graph and analyse all of them, but it does help ensure I don’t miss something interesting (like, for example, high LCK_M_Sch_S locks every day between 08:00 and 08:45)

For the purposes of this post, let’s say the ones I’m interested in for a particular analysis are LCK_M_IX, PAGELATCH_EX, LATCH_EX and IO_COMPLETION.

To be clear, those are for this example only. Do Not copy the below code and run without specifying the waits you’re interested in looking at, or the results are going to be less than useless.

The first thing I want to do is add a Row_Number based on the times the wait stats were recorded, so that I can join and take the difference between one interval and the next. In theory it should be possible to do this with times, but the insert doesn’t occur at exactly the same time, to the millisecond, each interval, hence this would require fancy date manipulation. Easier to use a ROW_NUMBER

SELECT  WaitType,
        NumberOfWaits,
        SignalWaitTime,
        ResourceWaitTime,
        SampleTime,
        ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval
FROM    dbo.WaitStats
WHERE   WaitType IN ('LCK_M_IX', ‘PAGELATCH_EX’, 'LATCH_EX', 'IO_COMPLETION');

Next step, turn that into a CTE, join the CTE to itself with an offset and take the difference of the waiting tasks, the signal wait time and the resource wait time.

WITH    RawWaits
          AS (SELECT    WaitType,
                        NumberOfWaits,
                        SignalWaitTime,
                        ResourceWaitTime,
                        SampleTime,
                        ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval
              FROM      dbo.WaitStats
              WHERE     WaitType IN ('LCK_M_IX', ‘PAGELATCH_EX’, 'LATCH_EX', 'IO_COMPLETION')
             )
    SELECT  w1.SampleTime,
            w1.WaitType AS WaitType,
            w2.NumberOfWaits - w1.NumberOfWaits AS NumerOfWaitsInInterval,
            w2.ResourceWaitTime - w1.ResourceWaitTime AS WaitTimeInInterval,
            w2.SignalWaitTime - w1.SignalWaitTime AS SignalWaitTimeInInterval
    FROM    RawWaits w1
            LEFT OUTER JOIN RawWaits w2 ON w2.WaitType = w1.WaitType
                                           AND w2.Interval= w1.Interval + 1;

Last step, pivot the results. This will pivot and show the resource wait. Change the column that’s in the select and the pivot to show the others. It doesn’t matter what aggregation function is used because there’s only one value in each interval, so sum, avg, min and max will all give the same result (just, don’t use count)

WITH    RawWaits
          AS (SELECT    WaitType,
                        NumberOfWaits,
                        SignalWaitTime,
                        ResourceWaitTime,
                        SampleTime,
                        ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval
              FROM      dbo.WaitStats
              WHERE     WaitType IN ('LCK_M_IX', 'PAGELATCH_EX', 'LATCH_EX', 'IO_COMPLETION')
             ),
        WaitIntervals
          AS (SELECT    w1.SampleTime,
                        w1.WaitType AS WaitType,
                        w2.NumberOfWaits - w1.NumberOfWaits AS NumerOfWaitsInInterval,
                        w2.ResourceWaitTime - w1.ResourceWaitTime AS WaitTimeInInterval,
                        w2.SignalWaitTime - w1.SignalWaitTime AS SignalWaitTimeInInterval
              FROM      RawWaits w1
                        LEFT OUTER JOIN RawWaits w2 ON w2.WaitType = w1.WaitType
                                                       AND w2.Interval = w1.Interval + 1
             )
    SELECT  *
    FROM    (SELECT SampleTime, WaitType, WaitTimeInInterval FROM WaitIntervals
            ) p PIVOT ( AVG(WaitTimeInInterval) FOR WaitType IN ([LCK_M_IX], [PAGELATCH_EX], [LATCH_EX], [IO_COMPLETION]) ) AS pvt
    ORDER BY SampleTime;

And there we have a result that can easily be imported into excel (or R) and graphed or analysed further.

Q&A from the DBA Fundamentals Virtual Chapter presentation

A couple weeks ago I presented to the DBA Fundamentals virtual chapter. The presentation was recorded and is available from their site.

While I answered some questions during the presentation, I couldn’t answer all of them. Hence this blog post with the rest of the questions and some answers.

Q1: Is monitoring any different in Azure SQL DB?

A1: Completely different. What I was talking about when the question was asked was perfmon counters and wait stats. Since you don’t have access to the server with the SQL DB, you can’t run perfmon. Even if you could, there’s unknown other workloads on the server which would make any such monitoring useless. Instead you can use the DMV sys.dm_db_resource_stats, which gives you the resource consumption relative to the maximum allowed for the tier that you’re paying for. For more details, see https://azure.microsoft.com/en-us/blog/azure-sql-database-introduces-new-near-real-time-performance-metrics/

The wait stats can be monitored with the DMV sys.dm_db_wait_stats, instead of sys.dm_os_wait_stats that you’d use on an earthed SQL Server. See https://msdn.microsoft.com/en-us/library/dn269834.aspx

Q2: What interval should we use for perfmon and how long should it be run?

A2: Personally I’m happy using the 15 second default in most cases. Perfmon has minimal overhead and the files aren’t large. If I’m trying to pin down an intermittent issue I’ll reduce the time, but I’ll very rarely increase it.

When analysing a server, I want minimum a day and that’s bare minimum. A week is good, that way I can see trends over several days and not be caught out by any non-standard workloads on one day.

Q3: Use performance monitor or sys.dm_os_performance_counters

Perfmon. Running a job every 15 seconds is hard and only the SQL counters are available through the DMV, so I’ll just use a performance monitor counter trace and save out as a binary file.

Q4: Is high CXPacket a problem?

By itself, all CXPacket waits mean is that queries are running in parallel. To determine whether that’s a problem or not requires looking at the queries that are running in parallel and seeing whether they should be, or whether they should be serial.

Most cases I’ve seen recently with very high CXPacket waits and very high Access_Methods_Dataset_Parent latch waits have been a result of inefficient queries and poor indexing, not a problem with parallelism itself.

http://sqlperformance.com/2015/06/sql-performance/knee-jerk-wait-statistics-cxpacket

Q5: What should average PLE be?

The higher the better. It measures how long, on average, a page stays in cache. Lower numbers mean more churn of the buffer pool. There’s no one number where above is good and below is bad.

http://sqlperformance.com/2014/10/sql-performance/knee-jerk-page-life-expectancy

and other pointless query rewrites

In a similar vein to last week’s blog post… I heard an interesting comment recently. “Change that Column != 2 to a Column > 2 or Column < 2 combination, it can use indexes better.”

Sounds like something that clearly needs testing!

I’ll start with simple numbers table.

CREATE TABLE dbo.Numbers (
  Number INT NOT NULL
);

ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers
  PRIMARY KEY CLUSTERED (Number);

and put 1 million rows into it

INSERT INTO dbo.Numbers (Number)
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))
  FROM msdb.sys.columns c CROSS JOIN msdb.sys.columns c2;

Let’s start, before we get into comparing things, with looking at the execution plan of a query with a != (or <>) operator.

SELECT Number
  FROM Numbers
  WHERE Number <> 12742; -- because 2 is on the first page of the index, and I don’t want any special cases here

InequalityExecutionPlan

That’s kinda complicated for a query with one table and one predicate. Let’s look at in pieces. The easiest place to start is the Clustered Index Seek. The seek predicate on the clustered index seek is

Seek Keys[1]: Start: [Test].[dbo].[Numbers].Number > Scalar Operator([Expr1009]), End: [Test].[dbo].[Numbers].Number < Scalar Operator([Expr1010])

Hmm…Looks like the parser/optimiser has already made our intended change for us. There’s some funky stuff in the top part of the plan, but what it’s essentially doing is generating two rows for the nested loop join, both with just the value that we’re excluding from the query, then the seek runs twice. I suspect that’s once for the less than 12742 and once for the greater than 12742 portions of the original predicate.

But, let’s do the full due diligence, the plan may not tell the whole story.

The performance numbers for the inequality form of the query, gathered via Extended Events and aggregated with Excel are:

Duration 122ms
CPU 105ms
Logical reads: 1619

This is our baseline, the numbers we’re comparing against. If the comment mentioned at the beginning is correct, then the revised query will have a significantly better performance.

The revised query is:

SELECT Number
  FROM Numbers
  WHERE Number > 12742 OR Number < 12742;

Execution plan is much simpler, no constant scans, no joins. Just a single index seek operation that executes once.

InequalityRevised

Is is better though?

Duration: 126ms
CPU: 103ms
Logical reads: 1619

No, it’s not.

Yes, we have a simpler plan, but we do not have a more efficient query. We’re still reading every page in the index and fetching all but one row of the table. The work required is the same, the performance characteristics are the same.

But, maybe, if the numbers aren’t unique and we’re excluding more than just one row it’ll be different.

That needs a slightly different table to test on.

CREATE TABLE MoreNumbers (
SomeNumber INT NOT NULL,
Filler CHAR(100)
);

CREATE CLUSTERED INDEX idx_MoreNumbers ON dbo.MoreNumbers(SomeNumber);
GO

INSERT INTO dbo.MoreNumbers (SomeNumber, Filler)
SELECT TOP (500000) NTILE(50) OVER (ORDER BY (SELECT 1)), ''
FROM msdb.sys.columns c CROSS JOIN msdb.sys.columns c2;

I’m just going to look at the performance characteristics this time. The execution plans are the same as for the earlier query. The two queries are:

SELECT * FROM dbo.MoreNumbers WHERE SomeNumber != 24;
SELECT * FROM dbo.MoreNumbers WHERE SomeNumber < 24 OR SomeNumber > 24;

Query 1:

Duration 97ms
CPU 77ms
Logical Reads 7624

Query 2:

Duration 98ms
CPU 75ms
Logical Reads 7624

Again identical.

Just like with the pointless WHERE clause predicate last week, we have a query change that has had no effect on the query performance. Now, to be honest, there are some query form changes that can improve performance. For example, converting a set of OR predicates to UNION can improve query performance sometimes (and leave it unchanged in others), and so these kinds of rewrites do need to be tested to see if they’re useful.

More importantly though, those of us who are posting on forums and advising others have a responsibility to do these tests before we recommend changes to others, as they may very well not do them. If we don’t, we’re propagating myths and not helping the advancement of our field.

On the addition of useless where clauses

I remember a forum thread from a while back. The question was on how to get rid of the index scan that was in the query plan. Now that’s a poor question in the first place, as the scan might not be a problem, but it’s the first answer that really caught my attention.

Since the primary key is on an identity column, you can add a clause like ID > 0 to the query, then SQL will use an index seek.

Technically that’s correct. If the table has an identity column with the default properties (We’ll call it ID) and the clustered index is on that identity column, then a WHERE clause of the form WHERE ID > 0 AND <any other predicates on that table> can indeed execute with a clustered index seek (although it’s in no way guaranteed to do so). But is it a useful thing to do?

Time for a made up table and a test query.

CREATE TABLE dbo.Orders(
  OrderID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
  OrderDate DATETIME2(7) NULL,
  ShipmentRef CHAR(10) NULL,
  ShipmentDate DATE NULL,
  Status VARCHAR(20) NOT NULL
);

That’ll do the job. And then a few hundred thousand rows via SQL Data Generator and we’re good to go.

And for a query that has a nasty index scan, how about

SELECT  OrderDate,
    ShipmentDate,
    Status
  FROM    dbo.Orders
  WHERE   LTRIM(RTRIM(Status)) = 'Delivered';

ClusteredIndexScan

Now, that’s running as a clustered index scan because the predicate’s not SARGable and besides, there’s no index on that column, but let’s pretend we don’t know that.

If I add a WHERE clause predicate that filters no row out, can I get a query plan with an index seek?

SELECT  OrderDate,
    ShipmentDate,
    Status
  FROM    dbo.Orders
  WHERE   LTRIM(RTRIM(Status)) = 'Delivered'
    AND OrderID > 0;

Why yes, I can.

ClusteredIndexSeek

Op Success? Well…

The goal of performance tuning is to improve the performance of a query, not to change operators in a query plan. The plan is a tool, not a goal.

Have we, by adding a WHERE clause predicate that filters out no rows, improved performance of the query? This needs an extended events session to answer. Nothing fancy, just a sql_statement_completed event will do the trick.

I ran each query 10 times, copied the captured events into Excel and averaged them:

Query with just the LTRIM(RTRIM(Status)) = ‘Delivered’
CPU: 77ms
Duration: 543ms

Query with LTRIM(RTRIM(Status)) = ‘Delivered’ AND OrderID > 0
CPU: 80ms
Duration: 550ms

We haven’t tuned that query. I won’t say we’ve made it slower either, the differences are well within the error range on our measuring, but there’s definitely no meaningful performance gain.

There’s no gain because we haven’t changed how the query executes. A scan, and in this case it will be a scan of the entire index, will likely use the non-leaf levels of the b-tree to locate the logical first page of the leaf level, then will read the entire leaf level. The seek we managed to generate will use the b-tree to find the value 0 in the clustered index key, that’s what makes it a seek. Since the column is an identity starting at 1, that means the first row read will be on the logical first page of the leaf level, then it will read the entire leaf level.

Both will do the same amount of work, and so we haven’t done anything useful to the query by adding a WHERE clause that filters out no rows.

Scans are not always bad. If a query needs to read every row of a table, that’s a scan and effort shouldn’t be expended trying to make it an index seek.

To improve the performance of a query, we need to make changes that reduce the work needed to run the query. That often starts with reducing the amount of data that the query reads, by changing the query so that it can use indexes effectively and/or adding indexes to support the query. Not by adding pointless pieces to a query just to change plan operators from ones that are believed to be bad to ones that are believed to be good. Doing that is just a waste of time and effort.

Does an index scan always read the entire index?

No.

That’s a bit short for a blog post, so let me explain. First, the difference between a seek and a scan.

A seek is an operation which navigates down the index’s b-tree looking for a row or for the start/end of a range of rows. A seek requires a predicate and that predicate must be of the form that can be used as a search argument (SARGable).

A scan is a read of the leaf level of an index, possibly also reading the intermediate pages as well.

The key there is that a seek requires a predicate. If there’s no predicate, we cannot have a seek and hence must have a scan.

Let’s look at a couple of examples. I’m going to use a simple Numbers table as it’s perfectly adequate for what we’re doing here.

CREATE TABLE Numbers (
  Number INT NOT NULL PRIMARY KEY CLUSTERED
);

INSERT INTO Numbers (Number)
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))
  FROM msdb.sys.columns c1 CROSS JOIN msdb.sys.columns c2;

With the table created, I want to look at how many pages that clustered index has. It won’t be a huge number, the table is very small.

SELECT OBJECT_NAME(object_id) AS TableName, index_level, page_count, record_count, avg_record_size_in_bytes
  FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Numbers'),1,NULL, 'Detailed');

IndexPages

Three level deep clustered index, 1608 pages at the leaf and a total of 1614 pages in all levels.

Let’s start with something basic:

SET STATISTICS IO ON;
GO

SELECT * FROM Numbers;

FullTableScan

Table ‘Numbers’. Scan count 1, logical reads 1615, physical reads 0

That read the entire index, every page at every level. The extra page was likely the IAM page for that index. That’s kinda what we expect a scan to be.

For contrast, let’s try an obvious seek.

SELECT * FROM Numbers WHERE Number = 137;

Seek

Table ‘Numbers’. Scan count 0, logical reads 3, physical reads 0.

A single-row seek operation does three reads, which makes sense since we have a three-level deep clustered index.

Now, what about this?

SELECT TOP (1) * FROM Numbers;

It can’t be a seek operation, there’s no predicate. The only way this can be implemented is with a scan.

Top1Scan

It is indeed a scan, but did it read the entire table?

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

No. A scan of the entire index is over 1600 pages. This query read three. It’s a scan, but it’s a scan which stopped after reading one page of the leaf (the other two are likely the root and intermediate pages, used to locate the first page in the leaf).

The scan read one row and then stopped, because that’s all that was needed. It did that, because there was a 1 row row-goal added to the query. For more details on row goals, see Paul White’s article on the subject.

There’s other cases where a scan won’t read the entire index leaf level too.

Aggregations, well MIN and MAX of an indexed column:

SELECT MIN(Number) FROM Numbers;

IndexScanMin

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

EXISTS:

IF EXISTS(SELECT 1 FROM Numbers)
SELECT 1;

ExistsScan

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

In conclusion, a seek operation requires a predicate. Without a predicate, a query has to be evaluated with a scan, but a scan doesn’t always mean that the entire table is read.

Stop using Task Manager to check SQL’s memory usage!

There’s two fairly common questions I see on the forums around SQL Server’s memory usage. Either the question asks why SQL’s using too much memory, or why it’s using too little.

Too much memory isn’t usually a real problem, it’s often due to max server memory being left at its default of 2048TB, along with a lack of understanding of how SQL uses memory.

Too little memory used is also often not a real problem, rather it’s usually from using the wrong tools to check SQL Server’s memory usage.

Let’s start by looking at an example.

This particular server has 16 GB of memory, and SQL Server’s max server memory is set to 10GB. Since the last restart of the instance, I’ve run SELECT * FROM .. against every table in a 30GB database. That should have warmed the cache up nicely.

MemoryTaskManager

Err, what? SQL Server’s not even using 100MB? I’ve just read 30GB of data and SQL Server’s not even using 1% of it’s allowed memory!!!

Or is it?

TotalServerMemory

A look at perfmon shows a completely different picture. Total and target server memory are both at 10GB. So why is Task Manager showing such a low figure?

LockedPagesInErrorLog

The service account that SQL’s running under has been granted the Lock Pages in Memory permission. This means that SQL’s not using the normal Windows memory routines to allocate memory.

Normally, SQL Server and other Windows applications allocate memory using the VirtualAlloc API call. This allocates virtual memory which is pageable. When SQL Server has been given the Lock Pages in Memory permission, it doesn’t use VirtualAlloc to allocate memory. Instead it uses the API call AllocateUserPhysicalPages. Memory allocated with this API call is not pageable, it has to remain in physical memory.

Task Manager’s memory counters (the Working Set ones) only show memory that’s been allocated using VirtualAlloc. Hence, when SQL Server has the Lock Pages in Memory permission and is allocating most of its memory using AllocateUserPhysicalPages, its memory usage in Task Manager will appear to be extraordinarily low. As far as I recall, in that case it’s only the non-buffer memory which is still allocated with VirtualAlloc, and that’s just things like the thread stacks, the CLR memory, backup buffers, and similar. It’s not the buffer pool. The buffer pool gets allocated with AllocateUserPhysicalPages.

If I remove the Lock Pages in Memory and re-run the test, Task Manager shows completely different values

MemoryTaskManager_NoLockedPages

Now the buffer pool is being allocated with VirtualAlloc and so Task Manager shows the full 10GB of memory usage.

In summary, Task Manager can show a completely incorrect value for SQL Server’s memory usage if the SQL service account has the Lock Pages in Memory permission. This can lead to a lot of wasted time if it is concluded that Task Manager is correct and SQL Server is using little memory.

Rather leave Task Manager alone and use perfmon and the DMVs to check what SQL Server’s memory allocation actually is. They’ll both be correct whether Lock Pages are being used or not.

Natively compiled user-defined functions

One new thing that SQL Server 2016 has added is the ability to natively compile user-defined functions. Previously, native compilation, part of the hekaton feature, was limited to stored procedures.

When I saw that, the first question that came to mind is whether natively compiling a scalar function reduces the overhead when calling that function within another query. I’m not talking about data-accessing scalar UDFs, since natively compiled functions can only access in-memory tables, but functions that do simple manipulation of the parameters passed in. String formatting, for example, or date manipulation.

While not as harmful as data-accessing scalar UDFs, there’s still overhead as these are not inline functions, they’re called for each row in the resultset (as a look at the Stored Procedure Completed XE event would show), and the call to the function takes time. Admittedly not a lot of time, but when it’s on each row of a large resultset the total can be noticeable.

I decided to do a simple test. A query against a table with ~600k rows, one query with a traditional scalar function, one with a natively compiled function and one with the function’s contents in the query as a column.

The two functions:

CREATE FUNCTION dbo.DateOnly (@Input DATETIME)
RETURNS DATETIME
AS
BEGIN
  RETURN DATEADD(dd, DATEDIFF (dd, 0, @Input), 0);
END
GO

and

CREATE FUNCTION dbo.DateOnlyHekaton (@Input DATETIME)
RETURNS DATETIME
WITH NATIVE_COMPILATION, SCHEMABINDING
AS
BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'English')
  RETURN DATEADD(dd,DATEDIFF(dd,0,@Input),0);
END
GO

And the query without the function would be a simple

SELECT DATEADD(dd, DATEDIFF(dd, 0, SomeDate), 0)
FROM SomeTable;

Durations and CPU usage were caught with Extended Events. I ran each query 25 times and aggregated the results.

Average CPU (ms) Average Duration (ms)
In-line expression 289 294
Normal function 3555 3814
Natively Compiled Function 3318 3352

Not quite what I hoped. While the overhead of the natively compiled function is lower, it’s lower only by about 10%, which really is not worth it, now when we’re talking about an order of magnitude difference from the query without the function call.

Looks like the guidance is still going to be to not use scalar UDFs within other queries.

Book Review 2015

It’s been a few years since I wrote a review of books I’ve read, so this isn’t going to list everything I’ve read since 2012, just a few of the best books or series. For the entire list of all books, see the library page.

So, 2015. I set a goal of 50 (new) books to read in the year, same goal I’d set the previous few years. I read 70 in total. This doesn’t include re-reads of old favourites, just new books. For 2016 I’m going to set a goal of 75, that’s an average of 6.25 books a month, or 1.44 books a week.

Books2015Some of the best books or series of books I’ve read in the last year are:

Hounded (Iron Druid Chronicles)

Author: Kevin Hearne

This is a brilliant series. Atticus O’Sullivan is a druid, born during the Roman occupation of Britain and currently living in Arizona, running a second hand book shop. He’s hiding from a few deities (and other supernatural creatures) who he’s annoyed over the centuries, but his quiet life is about to come to an end.

Good character, good plotting, great humour. The series is a fantastic read.

Amazon link: Hounded

Dauntless (The Lost Fleet)

Author: Jack Campbell

Amazon link: Dauntless

Rivers of London

Probationary Constable Peter Grant is just trying to do his job, and avoid a transfer to the Case Progression Unit, when he interviews a murder witness who died a couple hundred years ago. The witness, not the victim. Then his life gets really strange.

This series is so much fun. British humour, magic, the weirdness that is London (and surrounds) and fun characters.

This is the first of a series. The other books are Moon over Soho, Whispers Underground, Broken Homes and Foxglove Summer. Rivers of London was also released under the title Midnight Riot.

Amazon link: Rivers of London

In Her Name: Redemption

Humans have been fighting a century-long war with the Kreegan Empire, and they’re losing. Reza Gard, orphaned at a young age due to the war, is kidnapped by the Kreegans and forced to live among his enemies, but he may hold the key the fulfilment of a 10 000 year old Kreegan prophecy, and to the end of the war.

Excellent science fiction, well plotted, good characters.

Amazon link: Redemption. This is the complete text of the Redemption trilogy, Empire, Confederation and Final Battle. There’s a trilogy that discusses the start of the war, The Last War, comprised of First Contact, Legend of the Sword and Dead Soul, but I didn’t enjoy it quite as much. First Contact is excellent, the other two aren’t quite as good.

Ready Player One

A story of a search through an online virtual world for a massive prize, nothing less than the complete ownership of that virtual world and several billion dollars. Lots and lots of geek references, games, TV shows, movies, etc.

Amazon link: Ready Player One

The Laundry Files

What if magic was real? What if magic could be done by solving certain mathematical equations? What if that meant that computer programmers were actually magicians?

That’s the premise of the Laundry Files. Magic is a branch of applied mathematics, and computers are really good at maths. Unfortunately, such magic tends to have unpleasant consequences, like interdimensional horrors eating your brain. Hence existence of a government department trying to keep magic under control, and put off the end of the world for a few more years.

This is another series set (mostly) in London with more typical British humour.

Amazon Link: The Atrocity Archives.

The Stormlight Archive

Brandon Sanderson did a wonderful job finishing off the Wheel of Time series, now he’s starting his own epic fantasy series, and it’s good.

The world of Roshar is scoured by hurricane-strength storms that cross from the east to west every few days. On this battered world kings and armies fight for power, for wealth and for legendary weapons called Shardblades. But there’s hints from various sources that something terrible is coming, something that will tear the world apart.

Amazon Link: The Way of Kings. Depending on the publisher, the books are either sold in 2 parts or as a single book. The two published titles of the Stormlight Archive are “The Way of Kings” and “Words of Radiance”.