Latest Posts

Monitoring wait stats

This post, like last week’s, is based off the presentation I did to the DBA Fundamentals virtual chapter.

The request was for more details on the method I use to capture wait and file stats on servers, The methods are pretty similar, so I’ll show waits.

This is by no means the only way of doing it, it’s the way I do it.

Part the First: Capture job

This is the easy part. Into a job step goes the following:

INSERT  INTO Performance.dbo.WaitStats
SELECT  wait_type as WaitType,
        waiting_tasks_count AS NumberOfWaits,
        signal_wait_time_ms AS SignalWaitTime,
        wait_time_ms - signal_wait_time_ms AS ResourceWaitTime,
        GETDATE() AS SampleTime
FROM    sys.dm_os_wait_stats
WHERE   wait_time_ms > 0
    AND wait_type NOT IN (<list of waits to ignore>);

Schedule the job to run on an interval for a couple of days. I like to run it every 15 min, maybe every half an hour. I’m trying to get overall behaviour, not identify queries. If I need later to see what queries incur a particular wait, I can use an extended event session.

For the list of waits to ignore, I use Glenn’s list, the latest version found at http://www.sqlskills.com/blogs/glenn/sql-server-diagnostic-information-queries-detailed-day-14/

I run this no less than a day, preferably a week if I can. 2-3 days is normally what I get.

Part the Second: Analysis script

The analysis script does two things:

  • Get the wait times within an interval
  • Pivot them so that I can easily graph in excel

To see which waits I want to include in the pivot, I look at the 20 waits with the highest increase in the interval monitored (this requires that the server wasn’t restarted during it).

I’m not necessarily going to graph and analyse all of them, but it does help ensure I don’t miss something interesting (like, for example, high LCK_M_Sch_S locks every day between 08:00 and 08:45)

For the purposes of this post, let’s say the ones I’m interested in for a particular analysis are LCK_M_IX, PAGELATCH_EX, LATCH_EX and IO_COMPLETION.

To be clear, those are for this example only. Do Not copy the below code and run without specifying the waits you’re interested in looking at, or the results are going to be less than useless.

The first thing I want to do is add a Row_Number based on the times the wait stats were recorded, so that I can join and take the difference between one interval and the next. In theory it should be possible to do this with times, but the insert doesn’t occur at exactly the same time, to the millisecond, each interval, hence this would require fancy date manipulation. Easier to use a ROW_NUMBER

SELECT  WaitType,
        NumberOfWaits,
        SignalWaitTime,
        ResourceWaitTime,
        SampleTime,
        ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval
FROM    dbo.WaitStats
WHERE   WaitType IN ('LCK_M_IX', ‘PAGELATCH_EX’, 'LATCH_EX', 'IO_COMPLETION');

Next step, turn that into a CTE, join the CTE to itself with an offset and take the difference of the waiting tasks, the signal wait time and the resource wait time.

WITH    RawWaits
          AS (SELECT    WaitType,
                        NumberOfWaits,
                        SignalWaitTime,
                        ResourceWaitTime,
                        SampleTime,
                        ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval
              FROM      dbo.WaitStats
              WHERE     WaitType IN ('LCK_M_IX', ‘PAGELATCH_EX’, 'LATCH_EX', 'IO_COMPLETION')
             )
    SELECT  w1.SampleTime,
            w1.WaitType AS WaitType,
            w2.NumberOfWaits - w1.NumberOfWaits AS NumerOfWaitsInInterval,
            w2.ResourceWaitTime - w1.ResourceWaitTime AS WaitTimeInInterval,
            w2.SignalWaitTime - w1.SignalWaitTime AS SignalWaitTimeInInterval
    FROM    RawWaits w1
            LEFT OUTER JOIN RawWaits w2 ON w2.WaitType = w1.WaitType
                                           AND w2.Interval= w1.Interval + 1;

Last step, pivot the results. This will pivot and show the resource wait. Change the column that’s in the select and the pivot to show the others. It doesn’t matter what aggregation function is used because there’s only one value in each interval, so sum, avg, min and max will all give the same result (just, don’t use count)

WITH    RawWaits
          AS (SELECT    WaitType,
                        NumberOfWaits,
                        SignalWaitTime,
                        ResourceWaitTime,
                        SampleTime,
                        ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval
              FROM      dbo.WaitStats
              WHERE     WaitType IN ('LCK_M_IX', 'PAGELATCH_EX', 'LATCH_EX', 'IO_COMPLETION')
             ),
        WaitIntervals
          AS (SELECT    w1.SampleTime,
                        w1.WaitType AS WaitType,
                        w2.NumberOfWaits - w1.NumberOfWaits AS NumerOfWaitsInInterval,
                        w2.ResourceWaitTime - w1.ResourceWaitTime AS WaitTimeInInterval,
                        w2.SignalWaitTime - w1.SignalWaitTime AS SignalWaitTimeInInterval
              FROM      RawWaits w1
                        LEFT OUTER JOIN RawWaits w2 ON w2.WaitType = w1.WaitType
                                                       AND w2.Interval = w1.Interval + 1
             )
    SELECT  *
    FROM    (SELECT SampleTime, WaitType, WaitTimeInInterval FROM WaitIntervals
            ) p PIVOT ( AVG(WaitTimeInInterval) FOR WaitType IN ([LCK_M_IX], [PAGELATCH_EX], [LATCH_EX], [IO_COMPLETION]) ) AS pvt
    ORDER BY SampleTime;

And there we have a result that can easily be imported into excel (or R) and graphed or analysed further.

Q&A from the DBA Fundamentals Virtual Chapter presentation

A couple weeks ago I presented to the DBA Fundamentals virtual chapter. The presentation was recorded and is available from their site.

While I answered some questions during the presentation, I couldn’t answer all of them. Hence this blog post with the rest of the questions and some answers.

Q1: Is monitoring any different in Azure SQL DB?

A1: Completely different. What I was talking about when the question was asked was perfmon counters and wait stats. Since you don’t have access to the server with the SQL DB, you can’t run perfmon. Even if you could, there’s unknown other workloads on the server which would make any such monitoring useless. Instead you can use the DMV sys.dm_db_resource_stats, which gives you the resource consumption relative to the maximum allowed for the tier that you’re paying for. For more details, see https://azure.microsoft.com/en-us/blog/azure-sql-database-introduces-new-near-real-time-performance-metrics/

The wait stats can be monitored with the DMV sys.dm_db_wait_stats, instead of sys.dm_os_wait_stats that you’d use on an earthed SQL Server. See https://msdn.microsoft.com/en-us/library/dn269834.aspx

Q2: What interval should we use for perfmon and how long should it be run?

A2: Personally I’m happy using the 15 second default in most cases. Perfmon has minimal overhead and the files aren’t large. If I’m trying to pin down an intermittent issue I’ll reduce the time, but I’ll very rarely increase it.

When analysing a server, I want minimum a day and that’s bare minimum. A week is good, that way I can see trends over several days and not be caught out by any non-standard workloads on one day.

Q3: Use performance monitor or sys.dm_os_performance_counters

Perfmon. Running a job every 15 seconds is hard and only the SQL counters are available through the DMV, so I’ll just use a performance monitor counter trace and save out as a binary file.

Q4: Is high CXPacket a problem?

By itself, all CXPacket waits mean is that queries are running in parallel. To determine whether that’s a problem or not requires looking at the queries that are running in parallel and seeing whether they should be, or whether they should be serial.

Most cases I’ve seen recently with very high CXPacket waits and very high Access_Methods_Dataset_Parent latch waits have been a result of inefficient queries and poor indexing, not a problem with parallelism itself.

http://sqlperformance.com/2015/06/sql-performance/knee-jerk-wait-statistics-cxpacket

Q5: What should average PLE be?

The higher the better. It measures how long, on average, a page stays in cache. Lower numbers mean more churn of the buffer pool. There’s no one number where above is good and below is bad.

http://sqlperformance.com/2014/10/sql-performance/knee-jerk-page-life-expectancy

and other pointless query rewrites

In a similar vein to last week’s blog post… I heard an interesting comment recently. “Change that Column != 2 to a Column > 2 or Column < 2 combination, it can use indexes better.”

Sounds like something that clearly needs testing!

I’ll start with simple numbers table.

CREATE TABLE dbo.Numbers (
  Number INT NOT NULL
);

ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers
  PRIMARY KEY CLUSTERED (Number);

and put 1 million rows into it

INSERT INTO dbo.Numbers (Number)
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))
  FROM msdb.sys.columns c CROSS JOIN msdb.sys.columns c2;

Let’s start, before we get into comparing things, with looking at the execution plan of a query with a != (or <>) operator.

SELECT Number
  FROM Numbers
  WHERE Number <> 12742; -- because 2 is on the first page of the index, and I don’t want any special cases here

InequalityExecutionPlan

That’s kinda complicated for a query with one table and one predicate. Let’s look at in pieces. The easiest place to start is the Clustered Index Seek. The seek predicate on the clustered index seek is

Seek Keys[1]: Start: [Test].[dbo].[Numbers].Number > Scalar Operator([Expr1009]), End: [Test].[dbo].[Numbers].Number < Scalar Operator([Expr1010])

Hmm…Looks like the parser/optimiser has already made our intended change for us. There’s some funky stuff in the top part of the plan, but what it’s essentially doing is generating two rows for the nested loop join, both with just the value that we’re excluding from the query, then the seek runs twice. I suspect that’s once for the less than 12742 and once for the greater than 12742 portions of the original predicate.

But, let’s do the full due diligence, the plan may not tell the whole story.

The performance numbers for the inequality form of the query, gathered via Extended Events and aggregated with Excel are:

Duration 122ms
CPU 105ms
Logical reads: 1619

This is our baseline, the numbers we’re comparing against. If the comment mentioned at the beginning is correct, then the revised query will have a significantly better performance.

The revised query is:

SELECT Number
  FROM Numbers
  WHERE Number > 12742 OR Number < 12742;

Execution plan is much simpler, no constant scans, no joins. Just a single index seek operation that executes once.

InequalityRevised

Is is better though?

Duration: 126ms
CPU: 103ms
Logical reads: 1619

No, it’s not.

Yes, we have a simpler plan, but we do not have a more efficient query. We’re still reading every page in the index and fetching all but one row of the table. The work required is the same, the performance characteristics are the same.

But, maybe, if the numbers aren’t unique and we’re excluding more than just one row it’ll be different.

That needs a slightly different table to test on.

CREATE TABLE MoreNumbers (
SomeNumber INT NOT NULL,
Filler CHAR(100)
);

CREATE CLUSTERED INDEX idx_MoreNumbers ON dbo.MoreNumbers(SomeNumber);
GO

INSERT INTO dbo.MoreNumbers (SomeNumber, Filler)
SELECT TOP (500000) NTILE(50) OVER (ORDER BY (SELECT 1)), ''
FROM msdb.sys.columns c CROSS JOIN msdb.sys.columns c2;

I’m just going to look at the performance characteristics this time. The execution plans are the same as for the earlier query. The two queries are:

SELECT * FROM dbo.MoreNumbers WHERE SomeNumber != 24;
SELECT * FROM dbo.MoreNumbers WHERE SomeNumber < 24 OR SomeNumber > 24;

Query 1:

Duration 97ms
CPU 77ms
Logical Reads 7624

Query 2:

Duration 98ms
CPU 75ms
Logical Reads 7624

Again identical.

Just like with the pointless WHERE clause predicate last week, we have a query change that has had no effect on the query performance. Now, to be honest, there are some query form changes that can improve performance. For example, converting a set of OR predicates to UNION can improve query performance sometimes (and leave it unchanged in others), and so these kinds of rewrites do need to be tested to see if they’re useful.

More importantly though, those of us who are posting on forums and advising others have a responsibility to do these tests before we recommend changes to others, as they may very well not do them. If we don’t, we’re propagating myths and not helping the advancement of our field.

On the addition of useless where clauses

I remember a forum thread from a while back. The question was on how to get rid of the index scan that was in the query plan. Now that’s a poor question in the first place, as the scan might not be a problem, but it’s the first answer that really caught my attention.

Since the primary key is on an identity column, you can add a clause like ID > 0 to the query, then SQL will use an index seek.

Technically that’s correct. If the table has an identity column with the default properties (We’ll call it ID) and the clustered index is on that identity column, then a WHERE clause of the form WHERE ID > 0 AND <any other predicates on that table> can indeed execute with a clustered index seek (although it’s in no way guaranteed to do so). But is it a useful thing to do?

Time for a made up table and a test query.

CREATE TABLE dbo.Orders(
  OrderID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
  OrderDate DATETIME2(7) NULL,
  ShipmentRef CHAR(10) NULL,
  ShipmentDate DATE NULL,
  Status VARCHAR(20) NOT NULL
);

That’ll do the job. And then a few hundred thousand rows via SQL Data Generator and we’re good to go.

And for a query that has a nasty index scan, how about

SELECT  OrderDate,
    ShipmentDate,
    Status
  FROM    dbo.Orders
  WHERE   LTRIM(RTRIM(Status)) = 'Delivered';

ClusteredIndexScan

Now, that’s running as a clustered index scan because the predicate’s not SARGable and besides, there’s no index on that column, but let’s pretend we don’t know that.

If I add a WHERE clause predicate that filters no row out, can I get a query plan with an index seek?

SELECT  OrderDate,
    ShipmentDate,
    Status
  FROM    dbo.Orders
  WHERE   LTRIM(RTRIM(Status)) = 'Delivered'
    AND OrderID > 0;

Why yes, I can.

ClusteredIndexSeek

Op Success? Well…

The goal of performance tuning is to improve the performance of a query, not to change operators in a query plan. The plan is a tool, not a goal.

Have we, by adding a WHERE clause predicate that filters out no rows, improved performance of the query? This needs an extended events session to answer. Nothing fancy, just a sql_statement_completed event will do the trick.

I ran each query 10 times, copied the captured events into Excel and averaged them:

Query with just the LTRIM(RTRIM(Status)) = ‘Delivered’
CPU: 77ms
Duration: 543ms

Query with LTRIM(RTRIM(Status)) = ‘Delivered’ AND OrderID > 0
CPU: 80ms
Duration: 550ms

We haven’t tuned that query. I won’t say we’ve made it slower either, the differences are well within the error range on our measuring, but there’s definitely no meaningful performance gain.

There’s no gain because we haven’t changed how the query executes. A scan, and in this case it will be a scan of the entire index, will likely use the non-leaf levels of the b-tree to locate the logical first page of the leaf level, then will read the entire leaf level. The seek we managed to generate will use the b-tree to find the value 0 in the clustered index key, that’s what makes it a seek. Since the column is an identity starting at 1, that means the first row read will be on the logical first page of the leaf level, then it will read the entire leaf level.

Both will do the same amount of work, and so we haven’t done anything useful to the query by adding a WHERE clause that filters out no rows.

Scans are not always bad. If a query needs to read every row of a table, that’s a scan and effort shouldn’t be expended trying to make it an index seek.

To improve the performance of a query, we need to make changes that reduce the work needed to run the query. That often starts with reducing the amount of data that the query reads, by changing the query so that it can use indexes effectively and/or adding indexes to support the query. Not by adding pointless pieces to a query just to change plan operators from ones that are believed to be bad to ones that are believed to be good. Doing that is just a waste of time and effort.

Does an index scan always read the entire index?

No.

That’s a bit short for a blog post, so let me explain. First, the difference between a seek and a scan.

A seek is an operation which navigates down the index’s b-tree looking for a row or for the start/end of a range of rows. A seek requires a predicate and that predicate must be of the form that can be used as a search argument (SARGable).

A scan is a read of the leaf level of an index, possibly also reading the intermediate pages as well.

The key there is that a seek requires a predicate. If there’s no predicate, we cannot have a seek and hence must have a scan.

Let’s look at a couple of examples. I’m going to use a simple Numbers table as it’s perfectly adequate for what we’re doing here.

CREATE TABLE Numbers (
  Number INT NOT NULL PRIMARY KEY CLUSTERED
);

INSERT INTO Numbers (Number)
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))
  FROM msdb.sys.columns c1 CROSS JOIN msdb.sys.columns c2;

With the table created, I want to look at how many pages that clustered index has. It won’t be a huge number, the table is very small.

SELECT OBJECT_NAME(object_id) AS TableName, index_level, page_count, record_count, avg_record_size_in_bytes
  FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Numbers'),1,NULL, 'Detailed');

IndexPages

Three level deep clustered index, 1608 pages at the leaf and a total of 1614 pages in all levels.

Let’s start with something basic:

SET STATISTICS IO ON;
GO

SELECT * FROM Numbers;

FullTableScan

Table ‘Numbers’. Scan count 1, logical reads 1615, physical reads 0

That read the entire index, every page at every level. The extra page was likely the IAM page for that index. That’s kinda what we expect a scan to be.

For contrast, let’s try an obvious seek.

SELECT * FROM Numbers WHERE Number = 137;

Seek

Table ‘Numbers’. Scan count 0, logical reads 3, physical reads 0.

A single-row seek operation does three reads, which makes sense since we have a three-level deep clustered index.

Now, what about this?

SELECT TOP (1) * FROM Numbers;

It can’t be a seek operation, there’s no predicate. The only way this can be implemented is with a scan.

Top1Scan

It is indeed a scan, but did it read the entire table?

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

No. A scan of the entire index is over 1600 pages. This query read three. It’s a scan, but it’s a scan which stopped after reading one page of the leaf (the other two are likely the root and intermediate pages, used to locate the first page in the leaf).

The scan read one row and then stopped, because that’s all that was needed. It did that, because there was a 1 row row-goal added to the query. For more details on row goals, see Paul White’s article on the subject.

There’s other cases where a scan won’t read the entire index leaf level too.

Aggregations, well MIN and MAX of an indexed column:

SELECT MIN(Number) FROM Numbers;

IndexScanMin

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

EXISTS:

IF EXISTS(SELECT 1 FROM Numbers)
SELECT 1;

ExistsScan

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

In conclusion, a seek operation requires a predicate. Without a predicate, a query has to be evaluated with a scan, but a scan doesn’t always mean that the entire table is read.

Stop using Task Manager to check SQL’s memory usage!

There’s two fairly common questions I see on the forums around SQL Server’s memory usage. Either the question asks why SQL’s using too much memory, or why it’s using too little.

Too much memory isn’t usually a real problem, it’s often due to max server memory being left at its default of 2048TB, along with a lack of understanding of how SQL uses memory.

Too little memory used is also often not a real problem, rather it’s usually from using the wrong tools to check SQL Server’s memory usage.

Let’s start by looking at an example.

This particular server has 16 GB of memory, and SQL Server’s max server memory is set to 10GB. Since the last restart of the instance, I’ve run SELECT * FROM .. against every table in a 30GB database. That should have warmed the cache up nicely.

MemoryTaskManager

Err, what? SQL Server’s not even using 100MB? I’ve just read 30GB of data and SQL Server’s not even using 1% of it’s allowed memory!!!

Or is it?

TotalServerMemory

A look at perfmon shows a completely different picture. Total and target server memory are both at 10GB. So why is Task Manager showing such a low figure?

LockedPagesInErrorLog

The service account that SQL’s running under has been granted the Lock Pages in Memory permission. This means that SQL’s not using the normal Windows memory routines to allocate memory.

Normally, SQL Server and other Windows applications allocate memory using the VirtualAlloc API call. This allocates virtual memory which is pageable. When SQL Server has been given the Lock Pages in Memory permission, it doesn’t use VirtualAlloc to allocate memory. Instead it uses the API call AllocateUserPhysicalPages. Memory allocated with this API call is not pageable, it has to remain in physical memory.

Task Manager’s memory counters (the Working Set ones) only show memory that’s been allocated using VirtualAlloc. Hence, when SQL Server has the Lock Pages in Memory permission and is allocating most of its memory using AllocateUserPhysicalPages, its memory usage in Task Manager will appear to be extraordinarily low. As far as I recall, in that case it’s only the non-buffer memory which is still allocated with VirtualAlloc, and that’s just things like the thread stacks, the CLR memory, backup buffers, and similar. It’s not the buffer pool. The buffer pool gets allocated with AllocateUserPhysicalPages.

If I remove the Lock Pages in Memory and re-run the test, Task Manager shows completely different values

MemoryTaskManager_NoLockedPages

Now the buffer pool is being allocated with VirtualAlloc and so Task Manager shows the full 10GB of memory usage.

In summary, Task Manager can show a completely incorrect value for SQL Server’s memory usage if the SQL service account has the Lock Pages in Memory permission. This can lead to a lot of wasted time if it is concluded that Task Manager is correct and SQL Server is using little memory.

Rather leave Task Manager alone and use perfmon and the DMVs to check what SQL Server’s memory allocation actually is. They’ll both be correct whether Lock Pages are being used or not.

Natively compiled user-defined functions

One new thing that SQL Server 2016 has added is the ability to natively compile user-defined functions. Previously, native compilation, part of the hekaton feature, was limited to stored procedures.

When I saw that, the first question that came to mind is whether natively compiling a scalar function reduces the overhead when calling that function within another query. I’m not talking about data-accessing scalar UDFs, since natively compiled functions can only access in-memory tables, but functions that do simple manipulation of the parameters passed in. String formatting, for example, or date manipulation.

While not as harmful as data-accessing scalar UDFs, there’s still overhead as these are not inline functions, they’re called for each row in the resultset (as a look at the Stored Procedure Completed XE event would show), and the call to the function takes time. Admittedly not a lot of time, but when it’s on each row of a large resultset the total can be noticeable.

I decided to do a simple test. A query against a table with ~600k rows, one query with a traditional scalar function, one with a natively compiled function and one with the function’s contents in the query as a column.

The two functions:

CREATE FUNCTION dbo.DateOnly (@Input DATETIME)
RETURNS DATETIME
AS
BEGIN
  RETURN DATEADD(dd, DATEDIFF (dd, 0, @Input), 0);
END
GO

and

CREATE FUNCTION dbo.DateOnlyHekaton (@Input DATETIME)
RETURNS DATETIME
WITH NATIVE_COMPILATION, SCHEMABINDING
AS
BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'English')
  RETURN DATEADD(dd,DATEDIFF(dd,0,@Input),0);
END
GO

And the query without the function would be a simple

SELECT DATEADD(dd, DATEDIFF(dd, 0, SomeDate), 0)
FROM SomeTable;

Durations and CPU usage were caught with Extended Events. I ran each query 25 times and aggregated the results.

Average CPU (ms) Average Duration (ms)
In-line expression 289 294
Normal function 3555 3814
Natively Compiled Function 3318 3352

Not quite what I hoped. While the overhead of the natively compiled function is lower, it’s lower only by about 10%, which really is not worth it, now when we’re talking about an order of magnitude difference from the query without the function call.

Looks like the guidance is still going to be to not use scalar UDFs within other queries.

Book Review 2015

It’s been a few years since I wrote a review of books I’ve read, so this isn’t going to list everything I’ve read since 2012, just a few of the best books or series. For the entire list of all books, see the library page.

So, 2015. I set a goal of 50 (new) books to read in the year, same goal I’d set the previous few years. I read 70 in total. This doesn’t include re-reads of old favourites, just new books. For 2016 I’m going to set a goal of 75, that’s an average of 6.25 books a month, or 1.44 books a week.

Books2015Some of the best books or series of books I’ve read in the last year are:

Hounded (Iron Druid Chronicles)

Author: Kevin Hearne

This is a brilliant series. Atticus O’Sullivan is a druid, born during the Roman occupation of Britain and currently living in Arizona, running a second hand book shop. He’s hiding from a few deities (and other supernatural creatures) who he’s annoyed over the centuries, but his quiet life is about to come to an end.

Good character, good plotting, great humour. The series is a fantastic read.

Amazon link: Hounded

Dauntless (The Lost Fleet)

Author: Jack Campbell

Amazon link: Dauntless

Rivers of London

Probationary Constable Peter Grant is just trying to do his job, and avoid a transfer to the Case Progression Unit, when he interviews a murder witness who died a couple hundred years ago. The witness, not the victim. Then his life gets really strange.

This series is so much fun. British humour, magic, the weirdness that is London (and surrounds) and fun characters.

This is the first of a series. The other books are Moon over Soho, Whispers Underground, Broken Homes and Foxglove Summer. Rivers of London was also released under the title Midnight Riot.

Amazon link: Rivers of London

In Her Name: Redemption

Humans have been fighting a century-long war with the Kreegan Empire, and they’re losing. Reza Gard, orphaned at a young age due to the war, is kidnapped by the Kreegans and forced to live among his enemies, but he may hold the key the fulfilment of a 10 000 year old Kreegan prophecy, and to the end of the war.

Excellent science fiction, well plotted, good characters.

Amazon link: Redemption. This is the complete text of the Redemption trilogy, Empire, Confederation and Final Battle. There’s a trilogy that discusses the start of the war, The Last War, comprised of First Contact, Legend of the Sword and Dead Soul, but I didn’t enjoy it quite as much. First Contact is excellent, the other two aren’t quite as good.

Ready Player One

A story of a search through an online virtual world for a massive prize, nothing less than the complete ownership of that virtual world and several billion dollars. Lots and lots of geek references, games, TV shows, movies, etc.

Amazon link: Ready Player One

The Laundry Files

What if magic was real? What if magic could be done by solving certain mathematical equations? What if that meant that computer programmers were actually magicians?

That’s the premise of the Laundry Files. Magic is a branch of applied mathematics, and computers are really good at maths. Unfortunately, such magic tends to have unpleasant consequences, like interdimensional horrors eating your brain. Hence existence of a government department trying to keep magic under control, and put off the end of the world for a few more years.

This is another series set (mostly) in London with more typical British humour.

Amazon Link: The Atrocity Archives.

The Stormlight Archive

Brandon Sanderson did a wonderful job finishing off the Wheel of Time series, now he’s starting his own epic fantasy series, and it’s good.

The world of Roshar is scoured by hurricane-strength storms that cross from the east to west every few days. On this battered world kings and armies fight for power, for wealth and for legendary weapons called Shardblades. But there’s hints from various sources that something terrible is coming, something that will tear the world apart.

Amazon Link: The Way of Kings. Depending on the publisher, the books are either sold in 2 parts or as a single book. The two published titles of the Stormlight Archive are “The Way of Kings” and “Words of Radiance”.

When naming transactions causes an error

For the last part of the series on transactions, I’m going to look at a problem that I ran across entirely by chance while doing some performance tuning. A procedure had a transaction in it, and it was named. Something like this:

CREATE PROCEDURE RemoveUser (@UserID INT)
AS

BEGIN TRY
  BEGIN TRANSACTION Important
    DELETE FROM Users WHERE UserID = @UserID
  COMMIT TRANSACTION Important
END TRY
BEGIN CATCH
  ROLLBACK TRANSACTION Important
END CATCH
GO

The error’s squelched, not handled, but let’s ignore that, it’s not the point of this post. The example as written doesn’t need a transaction, but the real procedure was a tad more complex. The transaction is named, and the name is specified in the COMMIT and ROLLBACK as well.

Before I go into the problem with this code, I’m going to quote from three Books Online entries.

BEGIN TRANSACTION

transaction_name

Is the name assigned to the transaction. transaction_name must conform to the rules for identifiers, but identifiers longer than 32 characters are not allowed. Use transaction names only on the outermost pair of nested BEGIN…COMMIT or BEGIN…ROLLBACK statements. transaction_name is always case sensitive, even when the instance of SQL Server is not case sensitive.

COMMIT TRANSACTION

transaction_name

Is ignored by the SQL Server Database Engine. transaction_name specifies a transaction name assigned by a previous BEGIN TRANSACTION. transaction_name must conform to the rules for identifiers, but cannot exceed 32 characters. transaction_name can be used as a readability aid by indicating to programmers which nested BEGIN TRANSACTION the COMMIT TRANSACTION is associated with.

ROLLBACK TRANSACTION

transaction_name

Is the name assigned to the transaction on BEGIN TRANSACTION. transaction_name must conform to the rules for identifiers, but only the first 32 characters of the transaction name are used. When nesting transactions, transaction_name must be the name from the outermost BEGIN TRANSACTION statement. transaction_name is always case sensitive, even when the instance of SQL Server is not case sensitive.

I do love how Books Online refers to nested transactions as though they were real things…

Now that we’re perused the documentation, let’s go back to the code. As-is it runs fine, however let’s try that procedure called from within another transaction.

BEGIN TRANSACTION
  EXEC dbo.RemovePosts @UserID = 152;
  EXEC dbo.RemoveThreads @UserID = 152;
  EXEC dbo.RemoveUser @UserID = 152;
COMMIT

Some sort of ‘remove all my details’ functionality, except that I’ve ‘forgotten’ to check one foreign key relationship and so the delete inside the inner transaction is going to throw an error. Execution is going to be transferred to the CATCH block and the ROLLBACK TRANSACTION is going to run.

Msg 6401, Level 16, State 1, Procedure RemoveUser, Line 10
Cannot roll back Important. No transaction or savepoint of that name was found.

Whoops.

The documentation for ROLLBACK TRANSACTION states ‘When nesting transactions, transaction_name must be the name from the outermost BEGIN TRANSACTION statement.’. In this case a second transaction has been wrapped around the first, the outer transaction had no name and hence an error occurred.

No big deal though, right? It’s just a different error being thrown, right? It’s not as if we were handling and logging the error properly in the first place.

Not quite.

OpenTransaction

The error was thrown by the ROLLBACK statement. As such, the transaction is still open, the locks are held and the transaction log space can’t be reused. Unless the application that called this was checking for open transactions, that transaction could potentially be left open for quite some amount of time, causing blocking and/or the transaction log to grow.

It’s not just that someone in the future might call the code from another stored proc within a transaction, it’s also that it might be that the code is called from an application which started a transaction. Or called from SSIS which started a transaction. It’s very hard to ensure that code is never called from within an existing transaction

In conclusion, if you want to put a name on a BEGIN TRANSACTION, for documentation purposes or to help with debugging, go ahead. If you want to put a name on a COMMIT, that’s fine too, although it has no effect at all on how the code runs. Don’t put names on the ROLLBACK ever, as it leaves the code with a potential hidden bug.

This concludes the short series on transactions. The full list is:

  1. A Mess of Nested Transactions
  2. Savepoints and conditional transactions
  3. Why would you want to name a transaction?
  4. When naming transactions causes an error (this post)

Why would you want to name a transaction?

Something that i keep seeing in documentation, in forum code and in real systems is transactions that have names

BEGIN TRANSACTION VeryImportantProcess

<do stuff>

COMMIT VeryImportantProcess

Now the question came up on the forums a while back as to what the point of this is. Why would you name a transaction, what effect does it have on behaviour and how is it treated internally?

In the first post in this series I had a look at nested transactions and in that post we saw that nested transactions don’t have any effect on how the transactions are treated. The BEGIN TRANSACTION statement isn’t even logged if there’s already an open transaction in that session and a ROLLBACK still rolls everything back (unless it’s specifically rolling back to a savepoint)

First question then regarding naming of transactions is whether it has any effect at all on that behaviour.

CREATE TABLE TestTrans (
ID int,
InsertDate DATETIME
);

BEGIN TRANSACTION OuterTran
  INSERT INTO TestTrans (ID, InsertDate) VALUES (1,GETDATE())
  BEGIN TRANSACTION InnerTran
    INSERT INTO TestTrans (ID, InsertDate) VALUES (2,GETDATE())
  COMMIT TRANSACTION InnerTran
ROLLBACK TRANSACTION OuterTran

SELECT * FROM TestTrans

NoRecords

Clearly none. Even with all the transactions named, a commit of a named transaction did not commit anything until the transaction nest level reached 0. A rollback still rolls back everything.

Ok, so maybe there’s some internal usage for a transaction’s name. Let’s have a look at the transaction log to see.

CHECKPOINT;

BEGIN TRANSACTION OuterTran
  INSERT INTO TestTrans (ID, InsertDate) VALUES (1,GETDATE())
  BEGIN TRANSACTION InnerTran
    INSERT INTO TestTrans (ID, InsertDate) VALUES (2,GETDATE())
  COMMIT TRANSACTION InnerTran
ROLLBACK TRANSACTION OuterTran

SELECT * FROM fn_DBLog(NULL, NULL)

TranLogContents

The outer transaction’s name appears in the log, replacing what was ‘user_transaction’ previously, but other than that there’s no differences. The inner transaction still doesn’t appear anywhere in the log, neither does the inner commit, and the rollback doesn’t reference the transaction name anywhere.

So what conclusion can we come to here? Pretty much that naming of transactions has one real use, as a form of documentation. Instead of putting a comment above an BEGIN TRANSACTION we can give the transaction a name that indicates what the transaction does, That’s about the only real use.

With two exceptions.

There’s a seldom used option on a BEGIN TRANACTION, an optional WITH clause, Using that along with the transaction name can have some uses in exceptional circumstances.

BEGIN TRANSACTION VeryImportantProcess WITH MARK

<do stuff>

COMMIT TRANSACTION VeryImportantProcess

Now if we run an example of that and look into the log, there’s something different there. Interestingly, the difference isn’t on the BEGIN_XACT, it’s on the COMMIT_XACT log record.

2015_10_05_15_55_36_SQLQuery1.sql_EXCALIBUR.TransactionDemo_Excalibur_Gail_54_Microsoft_SQL_

The question still stands though, why do this? Why name a transaction and put a mark of that into the transaction log. What  benefit does it give?

The mark can be used with a rather poorly known and seldom used pair of options on the restore command (when restoring transaction log backups)

  • WITH STOPAFTERMARK
  • WITH STOPBEFOREMARK

These two, similar to the better known STOPAT can be used to stop a transaction log restore part way through a log backup. Unlike STOPAT, they don’t take a time, but rather take a string, a name that was used for a marked transaction name.

Now to be honest there’s not all that much use for this. If a user drops a table or otherwise messes data up requiring a restore to the point before that happened they’re not going to be nice and run that accidental data mess up in a named, marked transaction.

Where it can be useful is when the transaction crossed multiple databases and/or multiple servers. The log mark appears in all of them and can therefore be used to restore multiple databases to a consistent state where that consistent state is not a single point in time.

Still, it’s useful to have in the toolbox just in case some circumstance comes up where it’s needed.

Edit: It was pointed out, both on twitter and in the comments, that adding a name to a transaction reflects in the transaction-related DMVs and potentially makes debugging easier. Absolutely, it can do and it’s a valid reason to name transactions, I’m lumping that in with ‘documentation’ in the discussions here.

There’s another case where naming transactions can affect behaviour and in this case usually in a bad way. I’ll look at in in more detail in part 4 of this series, but for now will just reference the relevant segment in Books Online:

transaction_name Is the name assigned to the transaction on BEGIN TRANSACTION. transaction_name must conform to the rules for identifiers, but only the first 32 characters of the transaction name are used. When nesting transactions, transaction_name must be the name from the outermost BEGIN TRANSACTION statement. transaction_name is always case sensitive, even when the instance of SQL Server is not case sensitive.

The full list is:

  1. A Mess of Nested Transactions
  2. Savepoints and conditional transactions
  3. Why would you want to name a transaction? (This post)
  4. When naming transactions causes an error