T-SQL

Revisiting catch-all queries

I originally wrote about catch-all queries early in 2009, just as something that I’d seen several times in client code. It turned into the 3rd most popular post ever on my blog.

A lot’s changed since 2009. When I wrote the original post, most production servers were SQL 2005 or SQL 2000. SQL 2008 had been out less than a year and its fix for catch-all queries, the RECOMPILE hint, didn’t even work properly (it had an incorrect results bug in RTM, was pulled in SP1 and fixed in SP2)

As such, my feelings on how to solve the problem with catch-all queries has changed over the years.

Before I get to solutions, let’s start with the root cause of the problem with catch-all queries – plan caching and the need for plans to be safe for reuse.

Let’s take a sample query. I’ll use the same one I used in the original post.

CREATE PROCEDURE SearchHistory
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost
FROM Production.TransactionHistory
WHERE (ProductID = @Product Or @Product IS NULL)
AND (ReferenceOrderID = @OrderID OR @OrderID Is NULL)
AND (TransactionType = @TransactionType OR @TransactionType Is NULL)
AND (Quantity = @Qty Or @Qty is null)
GO

There are two nonclustered indexes on the TransactionHistory table, one on ProductID, one on ReferenceOrderID and ReferenceLineID.

For the initial discussion, let’s just consider two of the clauses in the WHERE. I’ll leave the other two in the stored proc, but they won’t be used.

WHERE (ProductID = @Product Or @Product IS NULL)
AND (ReferenceOrderID = @OrderID OR @OrderID Is NULL)

We would expect, if the ProductID parameter is passed, to get a seek using the index on ProductID, if the ReferenceOrderID parameter is passed, to get a seek using the index on ReferenceOrderID, and if both are passed, then either an index intersection or a seek on one of the indexes, key lookup and secondary filter for the other, plus, in all cases, a key lookup to fetch the columns for the SELECT.

That’s not what we get (I cleared the plan cache before running each of these).

ProductScan

OrderScan

The expected indexes are used, but they’re used for scans not seeks. Why? Let’s just consider the second plan for a bit.

The index aren’t used for seeks, because plans must be safe for reuse. If a plan was generated with an index seek, seeking for ReferenceOrderID = @OrderID, and that plan was cached and reused later when @OrderID was NULL, we’d get incorrect results. ReferenceOrderID = NULL matches no records.

And so we have index scans with the full predicate (ReferenceOrderID = @OrderID OR @OrderID Is NULL) applied after the index is read.

This is not particularly efficient, as the properties on the index seek shows.

InefficientIndexScan

The entire index, all 113443 rows were read, to return a single row. Not ideal, but it’s far from the largest problem with this form of query.

The plan’s got an index scan on the index on ReferenceOrderID, and then a key lookup back to the clustered index. That key lookup has a secondary filter on it, (ProductID = @Product Or @Product IS NULL). The optimiser assumed that a small number of rows would be returned from the index seek on ReferenceOrderID (1.47 to be specific), and hence the key lookup would be cheap, but that’s not going to be the case if the plan is reused with a ProductID passed to it instead of a ReferenceOrderID.

Before we look at that, the performance characteristics for the procedure being called with the ReferenceOrderID parameter are:

PerformanceOrder

The duration and CPU are both in microseconds, making this a very fast query, despite the index scan.

Now, without clearing the plan cache, I’m going to run the procedure with only the ProductID parameter passed.

PerformanceProduct

CPU’s gone from an average of 8ms to around 120ms. Duration has gone from average around 6ms to about 125ms and reads have jumped from 271 (2 MB of data processed) to 340 597 (2.6 GB of data processed)

And this is for a table that has 113k records and a query that returned 4 rows.

The key lookup, which was fine when an OrderID was passed, is not fine when @OrderID is NULL and the index scan returns the entire table.

ExpensiveIndexScan

ExpensiveKeyLookup

The plans that the optimiser has come up with for this query form aren’t stable. They’re safe for reuse, they have to be, but performance-wise they’re not stable.

But, maybe it’s just this form of query, there are other ways to write queries with multiple optional parameters.

Let’s try the CASE and COALESCE forms.

CREATE PROCEDURE SearchHistory_Coalesce
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost
FROM Production.TransactionHistory
WHERE ProductID = COALESCE(@Product, ProductID)
AND ReferenceOrderID = COALESCE(@OrderID, ReferenceOrderID)
AND TransactionType = COALESCE(@TransactionType, TransactionType)
AND Quantity = COALESCE(@Qty, Quantity)
GO

CREATE PROCEDURE SearchHistory_Case
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT ProductID, ReferenceOrderID, TransactionType, Quantity,
TransactionDate, ActualCost
FROM Production.TransactionHistory
WHERE ProductID = CASE WHEN @Product IS NULL THEN ProductID ELSE @Product END
AND ReferenceOrderID = CASE WHEN @OrderID IS NULL THEN ReferenceOrderID ELSE @OrderID END
AND TransactionType = CASE WHEN @TransactionType IS NULL THEN TransactionType ELSE @TransactionType END
AND Quantity = CASE WHEN @Qty IS NULL THEN Quantity ELSE @Qty END
GO

Coalesce

Case

These both give us full table scans, rather than the index scan/key lookup we saw earlier. That means their performance will be predictable and consistent no matter what parameter values are used. Consistently bad, but at least consistent.

It’s also worth noting that neither of these will return correct results if there are NULL values in the columns used in the WHERE clause (because NULL != NULL). Thanks to Hugo Kornelis (b | t) for pointing this out.

And then two more forms that were mentioned in comments on the original post, slightly more complicated:

CREATE PROCEDURE SearchHistory_Case2
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT  ProductID,
ReferenceOrderID,
TransactionType,
Quantity,
TransactionDate,
ActualCost
FROM    Production.TransactionHistory
WHERE   (CASE WHEN @Product IS NULL THEN 1
WHEN @Product = ProductID THEN 1
ELSE 0
END) = 1
AND (CASE WHEN @OrderID IS NULL THEN 1
WHEN @OrderID = ReferenceOrderID THEN 1
ELSE 0
END) = 1
AND (CASE WHEN @TransactionType IS NULL THEN 1
WHEN @TransactionType = TransactionType THEN 1
ELSE 0
END) = 1
AND (CASE WHEN @Qty IS NULL THEN 1
WHEN @Qty = Quantity THEN 1
ELSE 0
END) = 1
GO

CREATE PROCEDURE SearchHistory_Complex
(@Product int = NULL, @OrderID int = NULL, @TransactionType char(1) = NULL, @Qty int = NULL)
AS
SELECT  ProductID,
ReferenceOrderID,
TransactionType,
Quantity,
TransactionDate,
ActualCost
FROM    Production.TransactionHistory
WHERE ((ProductID = @Product AND @Product IS NOT NULL) OR (@Product IS NULL))
AND ((ReferenceOrderID = @OrderID AND @OrderID IS NOT NULL) OR (@OrderID IS NULL))
AND ((TransactionType = @TransactionType AND @TransactionType IS NOT NULL) OR (@TransactionType IS NULL))
AND ((Quantity = @Qty AND @Qty IS NOT NULL) OR (@Qty IS NULL))

These two give the same execution plans as the first form we looked at, index scan and key lookup.

Performance-wise, we’re got two different categories of query. We’ve got some queries where the execution plan contains an index scan on one or other index on the table (depending on parameters passed) and a key lookup, and others where the execution plan contains a table scan (clustered index scan) no matter what parameters are passed.

But how do they perform? To test that, I’m going to start with an empty plan cache and run each query form 10 times with just the OrderID being passed and then 10 times with just the ProductID being passed, and aggregate the results.

Procedure Parameter CPU (ms) Duration (ms) Reads
SearchHistory OrderID 5.2 50 271
ProductID 123 173 340597
SearchHistory_Coalesce OrderID 7.8 43 805
ProductID 9.4 45 805
SearchHistory_Case OrderID 12.5 55 805
ProductID 7.8 60 804
SearchHistory_Case2 OrderID 10.5 48 272
ProductID 128 163 340597
SearchHistory_Complex OrderID 7.8 40 272
ProductID 127 173 340597

 

The query forms that had the clustered index scan in the plan have consistent performance. On large tables it will be consistently bad, it is a full table scan, but it will at least be consistent.

The query form that had the key lookup have erratic performance, no real surprise there, key lookups don’t scale well and looking up every single row in the table is going to hurt. And note that if I ran the queries in the reverse order on an empty plan cache, the queries with the ProductID passed would be fast and the queries with the OrderID would be slow.

So how do we fix this?

When I first wrote about this problem 7 years ago, I recommended using dynamic SQL and discussed the dynamic SQL solution in detail. The dynamic SQL solution still works very well, it’s not my preferred solution any longer however.

What is, is the RECOMPILE hint.

Yes, it does cause increased CPU usage due to the recompiles (and I know I’m likely to get called irresponsible and worse for recommending it), but in *most* cases that won’t be a huge problem. And if it is, use dynamic SQL.

I recommend considering the RECOMPILE hint first because it’s faster to implement and far easier to read. Dynamic SQL is harder to debug because of the lack of syntax highlighting and the increased complexity of the code. In the last 4 years, I’ve only had one case where I went for the dynamic SQL solution for a catch-all query, and that was on a server that was already high on CPU, with a query that ran many times a second.

From SQL 2008 SP2/SQL 2008 R2 onwards, the recompile hint relaxes the requirement that the generated plan be safe for reuse, since it’s never going to be reused. This firstly means that the plans generated for the queries can be the optimal forms, index seeks rather than index scans, and secondly will be optimal for the parameter values passed.

CatchallIndexSeek

And performance-wise?

RecompilePerformance

Reads down, duration down and CPU down even though we’re recompiling the plan on every execution (though this is quite a simple query, so we shouldn’t expect a lot of CPU to generate the plan).

How about the other forms, do they also improve with the RECOMPILE hint added? As I did before, I’m going to run each 10 times and aggregate the results, that after adding the RECOMPILE hint to each.

Procedure Parameter CPU (ms) Duration (ms) Reads
SearchHistory OrderID 0 1.3 28
ProductID 0 1.2 19
SearchHistory_Coalesce OrderID 6.2 1.2 28
ProductID 3.2 1.2 19
SearchHistory_Case OrderID 1.6 1.3 28
ProductID 0 1.2 19
SearchHistory_Case2 OrderID 7.8 15.6 232
ProductID 7.8 11.7 279
SearchHistory_Complex OrderID 1.5 1.4 28
ProductID 0 1.2 19

 

What can we conclude from that?

One thing we note is that the second form of case statement has a higher CPU, duration and reads than any other. If we look at the plan, it’s still running as an index scan/key lookup, despite the recompile hint.

The second thing is that the more complex forms perform much the same as the simpler forms, we don’t gain anything by adding more complex predicates to ‘guide’ the optimiser.

Third, the coalesce form might use slightly more CPU than the other forms, but I’d need to test a lot more to say that conclusively. The numbers we’ve got are small enough that there might well be measuring errors comparable to the number itself.

Hence, when this query form is needed, stick to the simpler forms of the query, avoid adding unnecessary predicates to ‘help’ the optimiser. Test the query with NULLs in the filtered columns, make sure it works as intended.

Consider the RECOMPILE hint first, over dynamic SQL, to make it perform well. If the query has long compile times or runs very frequently, then use dynamic SQL, but don’t automatically discount the recompile hint for fear of the overhead. In many cases it’s not that bad.

Homebuilt sequential columns

I gave my introductory session on transactions at all three of the South African SQL Saturdays in 2016, as well as at SQL Saturday Oregon in October 2017, and something that came up in most of them was the ‘manual sequence’, the idea of using a column in a table to store a max value and using that in place of an identity column or sequence object.

To be clear, I don’t think this is a good idea. The identity column works well if a sequential series of numbers are needed. If the number sequence needs to that spans tables, then the sequence object is a good replacement.

But, there will always be some requirements that insist on gap-less sequences, or insist on not using identity (probably for ‘compatibility’ reasons), so let’s see how to do it properly.

To start, the common attempt (taken from a random Stack Overflow answer)

DECLARE @next INT
SET @next = (SELECT (MAX(id) + 1) FROM Table1)

INSERT INTO Table1
VALUES (@next)

or, a slightly different form

DECLARE @next INT
SELECT @next = SequenceNumber + 1 FROM Table1

UPDATE Table1
SET SequenceNumber = @Next;

— Then use @Next in another table for an insert

This doesn’t work. Oh, to be sure it’ll work in testing, but once we get some heavy concurrent access, its flaws become apparent.

To test the first one, I’m going to use a table that just has an ID (supposed to be unique) and a second column to record which session_id did the insert

CREATE TABLE TestSequence (
ManualID INT NOT NULL,
SessionID INT
)

And then run this 100 times from 10 different sessions

DECLARE @next INT
SET @next = (SELECT (MAX(ManualID) + 1) FROM TestSequence)

INSERT INTO TestSequence
VALUES (@next, @@SPID)

Duplicates

And it doesn’t work because the select statement takes a shared lock. Shared locks are shared, and so multiple sessions can read the same max value from the table, then write back that same value+1 to the table, either generating duplicate rows or primary key/unique constraint violations (hopefully the latter)

So how do we fix it?

One option is to wrap the two statement in a transaction and add the UPDLOCK hint to the select. This ensures that no one else will be able to read the same max value from the table, but depending on indexes it could also cause some blocking and resultant slow queries.

Another way is to make the insert (or update) and the select a single atomic operation, by returning the inserted (or updated) value from the insert (or update) statement. We can use the OUTPUT clause for this.

Option one would have code similar to this:

BEGIN TRANSACTION

DECLARE @next INT;
SET @next = (SELECT (MAX (ManualID) + 1) FROM TestSequence WITH (TABLOCKX, HOLDLOCK));

INSERT  INTO TestSequence
VALUES  (@next, @@SPID);

COMMIT TRANSACTION

And option 2 looks like

INSERT INTO TestSequence
OUTPUT inserted.ManualID
SELECT MAX(ManualID) + 1 FROM TestSequence WITH (TABLOCKX, HOLDLOCK)

The locking hints are, unfortunately, necessary. I tried several variations with less restrictive hints and they either:
– Produced duplicates
– Deadlocked when the table was small
– Deadlocked all the time

None of which are desired, hence the use of an exclusive table lock to serialise access. Of course, the restrictive locks will make this slow under concurrent usage. An index on ManualID will help, a bit.

Now we can test both of those the same way we tested the first version. An easy way to see whether there are any duplicates is to check the count and the distinct count.

DuplicateCheck

To reiterate something I said earlier, I do not recommend using this. Identity columns, with their gaps, are fine for the majority of cases, especially the use of them for artificial primary keys. Artificial keys, if used, are meaningless numbers that should not be exposed to users, and hence gaps should be irrelevant.

The need for a gap-less sequence, stored in the table, should be an exceptional one, not a common one.

What is a SARGable predicate?

‘SARGable’ is a weird term. It gets bandied around a lot when talking about indexes and whether queries can seek on indexes. The term’s an abbreviation, ‘SARG’ stands for Search ARGument, and it means that the predicate can be executed using an index seek.

Lovely. So a predicate must be SARGable to be able to use an index seek, and it must be able to use an index seek to be SARGable. A completely circular definition.

So what does it actually mean for a predicate to be SARGable? (and we’ll assume for this discussion that there are suitable indexes available)

The most general form for a predicate is <expression> <operator> <expression>. To be SARGable, a predicate must, on one side, have a column, not an expression on a column. So, <column> <operator> <expression>

SELECT * FROM Numbers
WHERE Number = 42;

Seek1

SELECT * FROM Numbers
WHERE Number + 0 = 42;

Scan1

SELECT * FROM Numbers
WHERE Number = 42 + 0;

Seek2

Any1 function on a column will prevent an index seek from happening, even if the function would not change the column’s value or the way the operator is applied, as seen in the above case. Zero added to an integer doesn’t change the value of the column, but is still sufficient to prevent an index seek operation from happening.

While I haven’t yet found any production code where the predicate is of the form ‘Column + 0’ = @Value’, I have seen many cases where there are less obvious cases of functions on columns that do nothing other than to prevent index seeks.

UPPER(Column) = UPPER(@Variable) in a case-insensitive database is one of them, RTRIM(COLUMN) = @Variable is another. SQL ignores trailing spaces when comparing strings.

The other requirement for a predicate to be SARGable, for SQL Server at least, is that the column and expression are of the same data type or, if the data types differ, such that the expression will be implicitly converted to the data type of the column.

SELECT 1 FROM SomeTable
WHERE StringColumn = 0;

Scan2

SELECT 1 FROM SomeTable
WHERE StringColumn = ‘0’;

Seek3

There are some exceptions here. Comparing a DATE column to a DATETIME value would normally implicitly convert the column to DATETIME (more precise data type), but that doesn’t cause index scans. Neither does comparing an ascii column to a unicode string, at least in some collations.

In generally though, conversions should be explicit and decided on by the developer, not left up to what SQL server decides.

What about operators?

The majority are fine. Equality, Inequality, IN (with a list of values), IS NULL all allow index usage. EXIST and IN with a subquery are treated like joins, which may or may not use indexes depending on the join type chosen.

LIKE is a slight special case. Predicates with LIKE are only SARGable if the wildcard is not at the start of the string.

SELECT 1 FROM SomeStrings
WHERE ASCIIString LIKE 'A%'

Seek4

SELECT 1 FROM SomeStrings
WHERE ASCIIString LIKE '%A'

Scan3

There are blog posts that claim that adding NOT makes a predicate non-SARGable. In the general case that’s not true.

SELECT * FROM Numbers
WHERE NOT Number > 100;

Seek5

SELECT * FROM Numbers
WHERE NOT Number <= 100;

Seek6

SELECT * FROM Numbers
WHERE NOT Number = 137;

Seek7

These index seeks are returning most of the table, but there’s nothing in the definition of ‘SARGable’ that requires small portions of the table to be returned.

That’s mostly that for SARGable in SQL Server. It’s mostly about having no functions on the column and no implicit conversions of the column.

(1) An explicit CAST of a DATE column to DATETIME still leaves the predicate SARGable. This is an exception that’s been specifically coded into the optimiser.

SQL Server 2016 features: Temporal Tables

Another new feature in SQL 2016 is the Temporal Table (or System Versioning, as its referred to in the documentation). It allows a table to be versioned, in terms of data, and for queries to access rows of the table as they were at some earlier point in time,

I’m quite excited about this, because while we’ve always been able to do this manually, with triggers or CDC or CT, it’s been anything but trivial. I remember trying to implement a form of temporal tables back in SQL 2000, using triggers, and it was an absolute pain in the neck.

So how does it work? Let’s start with a normal un-versioned table.

CREATE TABLE dbo.Stock (
  StockReferenceID INT IDENTITY(1, 1) NOT NULL,
  IssueID INT NULL,
  Condition VARCHAR(10) NULL,
  AvailableQty SMALLINT NULL,
  Price NUMERIC(8, 2) NULL,
  PRIMARY KEY CLUSTERED (StockReferenceID ASC)
);

To make that a temporal table, we need to add two columns, a row start date and a row end date.

ALTER TABLE Stock
  ADD PERIOD FOR SYSTEM_TIME (RowStartDate, RowEndDate),
    RowStartDate DATETIME2 GENERATED ALWAYS AS ROW START HIDDEN NOT NULL DEFAULT GETDATE(),
    RowEndDate DATETIME2 GENERATED ALWAYS AS ROW END HIDDEN NOT NULL DEFAULT CONVERT(DATETIME2, '9999-12-31 23:59:59.99999999');

It’s a little complicated. From my, admittedly limited, testing, the NOT NULL and the DEFAULT are required. The start time’s default needs to be GETDATE() and the end time’s default needs to be the max value of the data type used.

Hidden is an interesting property, it means that the columns won’t appear if SELECT * FROM Stock… is run. The only way to see the column values is to explicitly state them in the SELECT clause.

TemporalHiddenColumns_thumb.png

I wonder if that property will be available for other columns in the future. It would be nice to be able to mark large blob columns as HIDDEN so that SELECT * doesn’t end up pulling many MB per row back.

That ALTER adds the row’s two time stamps. To enable the versioning then just requires

ALTER TABLE Stock
  SET (SYSTEM_VERSIONING = ON);

Once that’s done, the table gains a second, linked, table that contains the history of the rows.

TemporalTable-Object-Explorer_thumb.png

421576540 is the object_id for the Stock table. If is also possible to specify the name for the history table in the ALTER TABLE statement, if preferred.

The history table can be queried directly. The start and end times aren’t hidden in this one.

TemporalHistoryTable.png

Or, the temporal table can be queried with a new clause added to the FROM, FOR SYSTEM TIME… Full details at https://msdn.microsoft.com/en-us/library/mt591018.aspx

HistoryOfRow

Very neat.

The one thing that does need mentioning. This is not an audit solution. If the table is altered, history can be lost. If the table is dropped, history is definitely lost. Auditing requirements are such that audit records should survive both. Use this for historical views of how the data looked, and if you need an audit as well, look at something like SQLAudit.

and other pointless query rewrites

In a similar vein to last week’s blog post… I heard an interesting comment recently. “Change that Column != 2 to a Column > 2 or Column < 2 combination, it can use indexes better.”

Sounds like something that clearly needs testing!

I’ll start with simple numbers table.

CREATE TABLE dbo.Numbers (
  Number INT NOT NULL
);

ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers
  PRIMARY KEY CLUSTERED (Number);

and put 1 million rows into it

INSERT INTO dbo.Numbers (Number)
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))
  FROM msdb.sys.columns c CROSS JOIN msdb.sys.columns c2;

Let’s start, before we get into comparing things, with looking at the execution plan of a query with a != (or <>) operator.

SELECT Number
  FROM Numbers
  WHERE Number <> 12742; -- because 2 is on the first page of the index, and I don’t want any special cases here

InequalityExecutionPlan

That’s kinda complicated for a query with one table and one predicate. Let’s look at in pieces. The easiest place to start is the Clustered Index Seek. The seek predicate on the clustered index seek is

Seek Keys[1]: Start: [Test].[dbo].[Numbers].Number > Scalar Operator([Expr1009]), End: [Test].[dbo].[Numbers].Number < Scalar Operator([Expr1010])

Hmm…Looks like the parser/optimiser has already made our intended change for us. There’s some funky stuff in the top part of the plan, but what it’s essentially doing is generating two rows for the nested loop join, both with just the value that we’re excluding from the query, then the seek runs twice. I suspect that’s once for the less than 12742 and once for the greater than 12742 portions of the original predicate.

But, let’s do the full due diligence, the plan may not tell the whole story.

The performance numbers for the inequality form of the query, gathered via Extended Events and aggregated with Excel are:

Duration 122ms
CPU 105ms
Logical reads: 1619

This is our baseline, the numbers we’re comparing against. If the comment mentioned at the beginning is correct, then the revised query will have a significantly better performance.

The revised query is:

SELECT Number
  FROM Numbers
  WHERE Number > 12742 OR Number < 12742;

Execution plan is much simpler, no constant scans, no joins. Just a single index seek operation that executes once.

InequalityRevised

Is is better though?

Duration: 126ms
CPU: 103ms
Logical reads: 1619

No, it’s not.

Yes, we have a simpler plan, but we do not have a more efficient query. We’re still reading every page in the index and fetching all but one row of the table. The work required is the same, the performance characteristics are the same.

But, maybe, if the numbers aren’t unique and we’re excluding more than just one row it’ll be different.

That needs a slightly different table to test on.

CREATE TABLE MoreNumbers (
SomeNumber INT NOT NULL,
Filler CHAR(100)
);

CREATE CLUSTERED INDEX idx_MoreNumbers ON dbo.MoreNumbers(SomeNumber);
GO

INSERT INTO dbo.MoreNumbers (SomeNumber, Filler)
SELECT TOP (500000) NTILE(50) OVER (ORDER BY (SELECT 1)), ''
FROM msdb.sys.columns c CROSS JOIN msdb.sys.columns c2;

I’m just going to look at the performance characteristics this time. The execution plans are the same as for the earlier query. The two queries are:

SELECT * FROM dbo.MoreNumbers WHERE SomeNumber != 24;
SELECT * FROM dbo.MoreNumbers WHERE SomeNumber < 24 OR SomeNumber > 24;

Query 1:

Duration 97ms
CPU 77ms
Logical Reads 7624

Query 2:

Duration 98ms
CPU 75ms
Logical Reads 7624

Again identical.

Just like with the pointless WHERE clause predicate last week, we have a query change that has had no effect on the query performance. Now, to be honest, there are some query form changes that can improve performance. For example, converting a set of OR predicates to UNION can improve query performance sometimes (and leave it unchanged in others), and so these kinds of rewrites do need to be tested to see if they’re useful.

More importantly though, those of us who are posting on forums and advising others have a responsibility to do these tests before we recommend changes to others, as they may very well not do them. If we don’t, we’re propagating myths and not helping the advancement of our field.

On the addition of useless where clauses

I remember a forum thread from a while back. The question was on how to get rid of the index scan that was in the query plan. Now that’s a poor question in the first place, as the scan might not be a problem, but it’s the first answer that really caught my attention.

Since the primary key is on an identity column, you can add a clause like ID > 0 to the query, then SQL will use an index seek.

Technically that’s correct. If the table has an identity column with the default properties (We’ll call it ID) and the clustered index is on that identity column, then a WHERE clause of the form WHERE ID > 0 AND <any other predicates on that table> can indeed execute with a clustered index seek (although it’s in no way guaranteed to do so). But is it a useful thing to do?

Time for a made up table and a test query.

CREATE TABLE dbo.Orders(
  OrderID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
  OrderDate DATETIME2(7) NULL,
  ShipmentRef CHAR(10) NULL,
  ShipmentDate DATE NULL,
  Status VARCHAR(20) NOT NULL
);

That’ll do the job. And then a few hundred thousand rows via SQL Data Generator and we’re good to go.

And for a query that has a nasty index scan, how about

SELECT  OrderDate,
    ShipmentDate,
    Status
  FROM    dbo.Orders
  WHERE   LTRIM(RTRIM(Status)) = 'Delivered';

ClusteredIndexScan

Now, that’s running as a clustered index scan because the predicate’s not SARGable and besides, there’s no index on that column, but let’s pretend we don’t know that.

If I add a WHERE clause predicate that filters no row out, can I get a query plan with an index seek?

SELECT  OrderDate,
    ShipmentDate,
    Status
  FROM    dbo.Orders
  WHERE   LTRIM(RTRIM(Status)) = 'Delivered'
    AND OrderID > 0;

Why yes, I can.

ClusteredIndexSeek

Op Success? Well…

The goal of performance tuning is to improve the performance of a query, not to change operators in a query plan. The plan is a tool, not a goal.

Have we, by adding a WHERE clause predicate that filters out no rows, improved performance of the query? This needs an extended events session to answer. Nothing fancy, just a sql_statement_completed event will do the trick.

I ran each query 10 times, copied the captured events into Excel and averaged them:

Query with just the LTRIM(RTRIM(Status)) = ‘Delivered’
CPU: 77ms
Duration: 543ms

Query with LTRIM(RTRIM(Status)) = ‘Delivered’ AND OrderID > 0
CPU: 80ms
Duration: 550ms

We haven’t tuned that query. I won’t say we’ve made it slower either, the differences are well within the error range on our measuring, but there’s definitely no meaningful performance gain.

There’s no gain because we haven’t changed how the query executes. A scan, and in this case it will be a scan of the entire index, will likely use the non-leaf levels of the b-tree to locate the logical first page of the leaf level, then will read the entire leaf level. The seek we managed to generate will use the b-tree to find the value 0 in the clustered index key, that’s what makes it a seek. Since the column is an identity starting at 1, that means the first row read will be on the logical first page of the leaf level, then it will read the entire leaf level.

Both will do the same amount of work, and so we haven’t done anything useful to the query by adding a WHERE clause that filters out no rows.

Scans are not always bad. If a query needs to read every row of a table, that’s a scan and effort shouldn’t be expended trying to make it an index seek.

To improve the performance of a query, we need to make changes that reduce the work needed to run the query. That often starts with reducing the amount of data that the query reads, by changing the query so that it can use indexes effectively and/or adding indexes to support the query. Not by adding pointless pieces to a query just to change plan operators from ones that are believed to be bad to ones that are believed to be good. Doing that is just a waste of time and effort.

Does an index scan always read the entire index?

No.

That’s a bit short for a blog post, so let me explain. First, the difference between a seek and a scan.

A seek is an operation which navigates down the index’s b-tree looking for a row or for the start/end of a range of rows. A seek requires a predicate and that predicate must be of the form that can be used as a search argument (SARGable).

A scan is a read of the leaf level of an index, possibly also reading the intermediate pages as well.

The key there is that a seek requires a predicate. If there’s no predicate, we cannot have a seek and hence must have a scan.

Let’s look at a couple of examples. I’m going to use a simple Numbers table as it’s perfectly adequate for what we’re doing here.

CREATE TABLE Numbers (
  Number INT NOT NULL PRIMARY KEY CLUSTERED
);

INSERT INTO Numbers (Number)
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))
  FROM msdb.sys.columns c1 CROSS JOIN msdb.sys.columns c2;

With the table created, I want to look at how many pages that clustered index has. It won’t be a huge number, the table is very small.

SELECT OBJECT_NAME(object_id) AS TableName, index_level, page_count, record_count, avg_record_size_in_bytes
  FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Numbers'),1,NULL, 'Detailed');

IndexPages

Three level deep clustered index, 1608 pages at the leaf and a total of 1614 pages in all levels.

Let’s start with something basic:

SET STATISTICS IO ON;
GO

SELECT * FROM Numbers;

FullTableScan

Table ‘Numbers’. Scan count 1, logical reads 1615, physical reads 0

That read the entire index, every page at every level. The extra page was likely the IAM page for that index. That’s kinda what we expect a scan to be.

For contrast, let’s try an obvious seek.

SELECT * FROM Numbers WHERE Number = 137;

Seek

Table ‘Numbers’. Scan count 0, logical reads 3, physical reads 0.

A single-row seek operation does three reads, which makes sense since we have a three-level deep clustered index.

Now, what about this?

SELECT TOP (1) * FROM Numbers;

It can’t be a seek operation, there’s no predicate. The only way this can be implemented is with a scan.

Top1Scan

It is indeed a scan, but did it read the entire table?

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

No. A scan of the entire index is over 1600 pages. This query read three. It’s a scan, but it’s a scan which stopped after reading one page of the leaf (the other two are likely the root and intermediate pages, used to locate the first page in the leaf).

The scan read one row and then stopped, because that’s all that was needed. It did that, because there was a 1 row row-goal added to the query. For more details on row goals, see Paul White’s article on the subject.

There’s other cases where a scan won’t read the entire index leaf level too.

Aggregations, well MIN and MAX of an indexed column:

SELECT MIN(Number) FROM Numbers;

IndexScanMin

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

EXISTS:

IF EXISTS(SELECT 1 FROM Numbers)
SELECT 1;

ExistsScan

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

In conclusion, a seek operation requires a predicate. Without a predicate, a query has to be evaluated with a scan, but a scan doesn’t always mean that the entire table is read.

When naming transactions causes an error

For the last part of the series on transactions, I’m going to look at a problem that I ran across entirely by chance while doing some performance tuning. A procedure had a transaction in it, and it was named. Something like this:

CREATE PROCEDURE RemoveUser (@UserID INT)
AS

BEGIN TRY
  BEGIN TRANSACTION Important
    DELETE FROM Users WHERE UserID = @UserID
  COMMIT TRANSACTION Important
END TRY
BEGIN CATCH
  ROLLBACK TRANSACTION Important
END CATCH
GO

The error’s squelched, not handled, but let’s ignore that, it’s not the point of this post. The example as written doesn’t need a transaction, but the real procedure was a tad more complex. The transaction is named, and the name is specified in the COMMIT and ROLLBACK as well.

Before I go into the problem with this code, I’m going to quote from three Books Online entries.

BEGIN TRANSACTION

transaction_name

Is the name assigned to the transaction. transaction_name must conform to the rules for identifiers, but identifiers longer than 32 characters are not allowed. Use transaction names only on the outermost pair of nested BEGIN…COMMIT or BEGIN…ROLLBACK statements. transaction_name is always case sensitive, even when the instance of SQL Server is not case sensitive.

COMMIT TRANSACTION

transaction_name

Is ignored by the SQL Server Database Engine. transaction_name specifies a transaction name assigned by a previous BEGIN TRANSACTION. transaction_name must conform to the rules for identifiers, but cannot exceed 32 characters. transaction_name can be used as a readability aid by indicating to programmers which nested BEGIN TRANSACTION the COMMIT TRANSACTION is associated with.

ROLLBACK TRANSACTION

transaction_name

Is the name assigned to the transaction on BEGIN TRANSACTION. transaction_name must conform to the rules for identifiers, but only the first 32 characters of the transaction name are used. When nesting transactions, transaction_name must be the name from the outermost BEGIN TRANSACTION statement. transaction_name is always case sensitive, even when the instance of SQL Server is not case sensitive.

I do love how Books Online refers to nested transactions as though they were real things…

Now that we’re perused the documentation, let’s go back to the code. As-is it runs fine, however let’s try that procedure called from within another transaction.

BEGIN TRANSACTION
  EXEC dbo.RemovePosts @UserID = 152;
  EXEC dbo.RemoveThreads @UserID = 152;
  EXEC dbo.RemoveUser @UserID = 152;
COMMIT

Some sort of ‘remove all my details’ functionality, except that I’ve ‘forgotten’ to check one foreign key relationship and so the delete inside the inner transaction is going to throw an error. Execution is going to be transferred to the CATCH block and the ROLLBACK TRANSACTION is going to run.

Msg 6401, Level 16, State 1, Procedure RemoveUser, Line 10
Cannot roll back Important. No transaction or savepoint of that name was found.

Whoops.

The documentation for ROLLBACK TRANSACTION states ‘When nesting transactions, transaction_name must be the name from the outermost BEGIN TRANSACTION statement.’. In this case a second transaction has been wrapped around the first, the outer transaction had no name and hence an error occurred.

No big deal though, right? It’s just a different error being thrown, right? It’s not as if we were handling and logging the error properly in the first place.

Not quite.

OpenTransaction

The error was thrown by the ROLLBACK statement. As such, the transaction is still open, the locks are held and the transaction log space can’t be reused. Unless the application that called this was checking for open transactions, that transaction could potentially be left open for quite some amount of time, causing blocking and/or the transaction log to grow.

It’s not just that someone in the future might call the code from another stored proc within a transaction, it’s also that it might be that the code is called from an application which started a transaction. Or called from SSIS which started a transaction. It’s very hard to ensure that code is never called from within an existing transaction

In conclusion, if you want to put a name on a BEGIN TRANSACTION, for documentation purposes or to help with debugging, go ahead. If you want to put a name on a COMMIT, that’s fine too, although it has no effect at all on how the code runs. Don’t put names on the ROLLBACK ever, as it leaves the code with a potential hidden bug.

This concludes the short series on transactions. The full list is:

  1. A Mess of Nested Transactions
  2. Savepoints and conditional transactions
  3. Why would you want to name a transaction?
  4. When naming transactions causes an error (this post)

Why would you want to name a transaction?

Something that i keep seeing in documentation, in forum code and in real systems is transactions that have names

BEGIN TRANSACTION VeryImportantProcess

<do stuff>

COMMIT VeryImportantProcess

Now the question came up on the forums a while back as to what the point of this is. Why would you name a transaction, what effect does it have on behaviour and how is it treated internally?

In the first post in this series I had a look at nested transactions and in that post we saw that nested transactions don’t have any effect on how the transactions are treated. The BEGIN TRANSACTION statement isn’t even logged if there’s already an open transaction in that session and a ROLLBACK still rolls everything back (unless it’s specifically rolling back to a savepoint)

First question then regarding naming of transactions is whether it has any effect at all on that behaviour.

CREATE TABLE TestTrans (
ID int,
InsertDate DATETIME
);

BEGIN TRANSACTION OuterTran
  INSERT INTO TestTrans (ID, InsertDate) VALUES (1,GETDATE())
  BEGIN TRANSACTION InnerTran
    INSERT INTO TestTrans (ID, InsertDate) VALUES (2,GETDATE())
  COMMIT TRANSACTION InnerTran
ROLLBACK TRANSACTION OuterTran

SELECT * FROM TestTrans

NoRecords

Clearly none. Even with all the transactions named, a commit of a named transaction did not commit anything until the transaction nest level reached 0. A rollback still rolls back everything.

Ok, so maybe there’s some internal usage for a transaction’s name. Let’s have a look at the transaction log to see.

CHECKPOINT;

BEGIN TRANSACTION OuterTran
  INSERT INTO TestTrans (ID, InsertDate) VALUES (1,GETDATE())
  BEGIN TRANSACTION InnerTran
    INSERT INTO TestTrans (ID, InsertDate) VALUES (2,GETDATE())
  COMMIT TRANSACTION InnerTran
ROLLBACK TRANSACTION OuterTran

SELECT * FROM fn_DBLog(NULL, NULL)

TranLogContents

The outer transaction’s name appears in the log, replacing what was ‘user_transaction’ previously, but other than that there’s no differences. The inner transaction still doesn’t appear anywhere in the log, neither does the inner commit, and the rollback doesn’t reference the transaction name anywhere.

So what conclusion can we come to here? Pretty much that naming of transactions has one real use, as a form of documentation. Instead of putting a comment above an BEGIN TRANSACTION we can give the transaction a name that indicates what the transaction does, That’s about the only real use.

With two exceptions.

There’s a seldom used option on a BEGIN TRANACTION, an optional WITH clause, Using that along with the transaction name can have some uses in exceptional circumstances.

BEGIN TRANSACTION VeryImportantProcess WITH MARK

<do stuff>

COMMIT TRANSACTION VeryImportantProcess

Now if we run an example of that and look into the log, there’s something different there. Interestingly, the difference isn’t on the BEGIN_XACT, it’s on the COMMIT_XACT log record.

2015_10_05_15_55_36_SQLQuery1.sql_EXCALIBUR.TransactionDemo_Excalibur_Gail_54_Microsoft_SQL_

The question still stands though, why do this? Why name a transaction and put a mark of that into the transaction log. What  benefit does it give?

The mark can be used with a rather poorly known and seldom used pair of options on the restore command (when restoring transaction log backups)

  • WITH STOPAFTERMARK
  • WITH STOPBEFOREMARK

These two, similar to the better known STOPAT can be used to stop a transaction log restore part way through a log backup. Unlike STOPAT, they don’t take a time, but rather take a string, a name that was used for a marked transaction name.

Now to be honest there’s not all that much use for this. If a user drops a table or otherwise messes data up requiring a restore to the point before that happened they’re not going to be nice and run that accidental data mess up in a named, marked transaction.

Where it can be useful is when the transaction crossed multiple databases and/or multiple servers. The log mark appears in all of them and can therefore be used to restore multiple databases to a consistent state where that consistent state is not a single point in time.

Still, it’s useful to have in the toolbox just in case some circumstance comes up where it’s needed.

Edit: It was pointed out, both on twitter and in the comments, that adding a name to a transaction reflects in the transaction-related DMVs and potentially makes debugging easier. Absolutely, it can do and it’s a valid reason to name transactions, I’m lumping that in with ‘documentation’ in the discussions here.

There’s another case where naming transactions can affect behaviour and in this case usually in a bad way. I’ll look at in in more detail in part 4 of this series, but for now will just reference the relevant segment in Books Online:

transaction_name Is the name assigned to the transaction on BEGIN TRANSACTION. transaction_name must conform to the rules for identifiers, but only the first 32 characters of the transaction name are used. When nesting transactions, transaction_name must be the name from the outermost BEGIN TRANSACTION statement. transaction_name is always case sensitive, even when the instance of SQL Server is not case sensitive.

The full list is:

  1. A Mess of Nested Transactions
  2. Savepoints and conditional transactions
  3. Why would you want to name a transaction? (This post)
  4. When naming transactions causes an error

Savepoints and conditional transactions

This is the second in a short series on transactions. In the previous part I discussed nested transactions and showed that they actually don’t exist. SQL has syntax which may suggest that nested transactions exist, but it’s just a syntactical lie.

One thing I suggested in that post was that transactions should be started and committed (or rolled back) at one level only.

That’s a nice idea, but it often doesn’t work when dealing with existing systems which may have been developed with less attention to transactions than ideal. What happens when a stored procedure needs to run in a transaction and potentially roll back its changes, but can be called either within an existing transaction or independently?

The answer to that comes in two parts, first the conditional starting of a transaction and secondly save points.

I should mention that this is fairly advanced transaction management, it’s not something I see in production code very often. It should be considered carefully before being used, as if the people supporting the code don’t understand this, there could be trouble.

Let’s start with some setup. Some random tables and a stored procedure which inserts into parent and child tables.

CREATE TABLE Parent (
    SomeID INT IDENTITY PRIMARY KEY,
    SomeRandomValue VARCHAR(50),
    NotificationRequired BIT DEFAULT 0
);

CREATE TABLE Child (
    SomeOtherID INT IDENTITY PRIMARY KEY,
    ParentID INT,
    State INT,
    SomeValue INT
);

CREATE TABLE Notifications (
    ParentID INT,
    NotificationText VARCHAR(1000),
    NotificationDate DATETIME DEFAULT GETDATE()
);

Let’s say an outer procedure which may insert into the parent table, inserts into the child table and then calls another procedure. The other procedure inserts a notification into a table.

CREATE PROCEDURE OuterProc (@SomeValue VARCHAR(50), @SomeOtherValue INT)
AS

DECLARE @ParentID INT;

BEGIN TRY
    BEGIN TRANSACTION

    IF NOT EXISTS ( SELECT  1
                    FROM    dbo.Parent
                    WHERE   SomeRandomValue = @SomeValue )
        BEGIN
            INSERT  INTO dbo.Parent
                    (SomeRandomValue)
            VALUES  (@SomeValue);

            SELECT  @ParentID = @@IDENTITY;
        END
    ELSE
        SELECT  @ParentID = SomeID
        FROM    dbo.Parent
        WHERE   SomeRandomValue = @SomeValue;

    INSERT  INTO dbo.Child
            (ParentID, SomeValue)
    VALUES  (@ParentID, @SomeOtherValue);

    UPDATE  dbo.Parent
    SET     NotificationRequired = 1
    WHERE   SomeID = @ParentID;

    EXEC NotificationProc @ParentID;

    COMMIT TRANSACTION
END TRY
BEGIN CATCH
    ROLLBACK TRANSACTION;
-- other error handling here
END CATCH
GO

There are some potential race conditions in there, but let’s ignore those for now, it’s not the point of this post.

The requirement for the notification procedure is that if it throws an error all of its changes must be undone however the changes in the outer procedure must still commit. The notification procedure can be called independently or from within the OuterProc

The starting code for the notification procedure is:

CREATE PROCEDURE NotificationProc (@AnID INT)
AS
    UPDATE  dbo.Parent
    SET     NotificationRequired = 0
    WHERE   SomeID = @AnID

    INSERT  INTO dbo.Notifications
            (ParentID,
             NotificationText,
             NotificationDate
            )
            SELECT  SomeID,
                    SomeRandomValue,
                    GETDATE()
            FROM    dbo.Parent
            WHERE   SomeID = @AnID;

GO

This currently has no transaction management at all. Let’s start by implementing the first requirement, if called from outside of any transaction, it must begin a transaction and either commit or rollback all changes.

To do this, we can check the value of @@Trancount to see whether a transaction should be started or not.

CREATE PROCEDURE NotificationProc (@AnID INT)
AS
    DECLARE @Independent BIT = 0;
    IF @@TRANCOUNT = 0  -- proc called from outside of any transaction
        SET @Independent = 1;

    BEGIN TRY
        IF @Independent = 1
            BEGIN TRANSACTION;

        UPDATE  dbo.Parent
        SET     NotificationRequired = 0
        WHERE   SomeID = @AnID;

        INSERT  INTO dbo.Notifications
                (ParentID,
                 NotificationText,
                 NotificationDate
                )
                SELECT  SomeID,
                        SomeRandomValue,
                        GETDATE()
                FROM    dbo.Parent
                WHERE   SomeID = @AnID;

        IF @Independent = 1
            COMMIT TRANSACTION;

    END TRY
    BEGIN CATCH

        IF @Independent = 1
            ROLLBACK TRANSACTION;

        THROW;

    END CATCH;

GO

The THROW is there so that any error can be passed up to the caller to be logged or otherwise handled. It also ensures that it fires any catch block in the calling code.

That’s the first half, the transaction handling if called independently, but it doesn’t help with the second requirement, that the changes made in the notification procedure roll back if there’s an error, but leave the changes made in the calling procedure unaffected and the transaction open. To do that, we need a savepoint.

From Books Online:

A user can set a savepoint, or marker, within a transaction. The savepoint defines a location to which a transaction can return if part of the transaction is conditionally canceled. If a transaction is rolled back to a savepoint, it must proceed to completion with more Transact-SQL statements if needed and a COMMIT TRANSACTION statement, or it must be canceled altogether by rolling the transaction back to its beginning.

It’s not a nested transaction, it’s not like Oracle’s autonomous transactions, it’s just a point within the transaction to which we can roll back to. It lets us undo the most recent part of a transaction.

ALTER PROCEDURE NotificationProc (@AnID INT)
AS
    DECLARE @Independent BIT = 0;
    IF @@TRANCOUNT = 0  -- proc is being called from outside of any transaction
        SET @Independent = 1;

    BEGIN TRY
        IF @Independent = 1
            BEGIN TRANSACTION;
        ELSE
            SAVE TRANSACTION Notifications; -- Define a savepoint which we can roll the transaction back to.

        UPDATE  dbo.Parent
        SET     NotificationRequired = 0
        WHERE   SomeID = @AnID;

        INSERT  INTO dbo.Notifications
                (ParentID,
                 NotificationText,
                 NotificationDate
                )
                SELECT  SomeID,
                        SomeRandomValue,
                        GETDATE()
                FROM    dbo.Parent
                WHERE   SomeID = @AnID;

        IF @Independent = 1
            COMMIT TRANSACTION;

    END TRY
    BEGIN CATCH

        IF @Independent = 1
            ROLLBACK TRANSACTION; -- roll back the entire transaction
        ELSE
            ROLLBACK TRANSACTION Notifications; -- roll back to the save point

        SELECT  ERROR_NUMBER(),
                ERROR_MESSAGE(),
                ERROR_LINE(); -- In reality, log this somewhere

    END CATCH;

GO

Looks good, let’s test. To test, I’m going to add a check constraint to the Notifications table that will be violated by the insert in the Notification procedure. This is to simulate the kind of errors that can happen in a real system (key violations, data type conversion failures, check constraint violations, etc). I’ll also add some sample data to the Parent table.

ALTER TABLE Notifications ADD CONSTRAINT Testing CHECK (NotificationDate > '2020-01-01')

INSERT INTO dbo.Parent (SomeRandomValue, NotificationRequired)
VALUES  ('abc',0), ('def',1),('ghi',0),('jkl',1),('mno',1);

First test, outside of a transaction. The requirement is that neither of the changes in the proc remain after the execution (because of the error)

SELECT SomeID, SomeRandomValue, NotificationRequired FROM dbo.Parent WHERE SomeID = 2;

EXEC dbo.NotificationProc @AnID = 2;

SELECT SomeID, SomeRandomValue, NotificationRequired FROM dbo.Parent WHERE SomeID = 2;
SELECT * FROM dbo.Notifications WHERE ParentID = 2;

IndependentTransaction

Success. When run independently the two data modifications were run in a transaction and when the error occurred, were rolled back.

Now let’s try from within an existing transaction. The changes made in the outer procedure (insert of a row into Child and update NotificationRequired to 1 in Parent must commit, but the changes made in the inner proc must not)

SELECT SomeID, SomeRandomValue, NotificationRequired FROM dbo.Parent WHERE SomeID = 1;

EXEC dbo.OuterProc @SomeValue = 'abc', @SomeOtherValue = 7;

SELECT SomeID, SomeRandomValue, NotificationRequired FROM dbo.Parent WHERE SomeID = 1;
SELECT SomeOtherID, ParentID, SomeValue FROM dbo.Child WHERE ParentID = 1;
SELECT * FROM dbo.Notifications WHERE ParentID = 1;

NestedTransaction

Again, exactly the desired behaviour. The changes made in the outer procedure were committed, the changes in the inner procedure, the procedure where the error was thrown, were rolled back.

Used correctly, savepoints can be a powerful mechanism for managing transactions in SQL Server. Unfortunately they’re not well known and as such their use can also make code much harder for later developers to debug.

The next part of the series on transactions will look at why (and why not) you’d want to name a transaction.

The full list is:

  1. A Mess of Nested Transactions
  2. Savepoints and conditional transactions (This post)
  3. Why would you want to name a transaction?
  4. When naming transactions causes an error