T-SQL

On the addition of useless where clauses

I remember a forum thread from a while back. The question was on how to get rid of the index scan that was in the query plan. Now that’s a poor question in the first place, as the scan might not be a problem, but it’s the first answer that really caught my attention.

Since the primary key is on an identity column, you can add a clause like ID > 0 to the query, then SQL will use an index seek.

Technically that’s correct. If the table has an identity column with the default properties (We’ll call it ID) and the clustered index is on that identity column, then a WHERE clause of the form WHERE ID > 0 AND <any other predicates on that table> can indeed execute with a clustered index seek (although it’s in no way guaranteed to do so). But is it a useful thing to do?

Time for a made up table and a test query.

CREATE TABLE dbo.Orders(
  OrderID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
  OrderDate DATETIME2(7) NULL,
  ShipmentRef CHAR(10) NULL,
  ShipmentDate DATE NULL,
  Status VARCHAR(20) NOT NULL
);

That’ll do the job. And then a few hundred thousand rows via SQL Data Generator and we’re good to go.

And for a query that has a nasty index scan, how about

SELECT  OrderDate,
    ShipmentDate,
    Status
  FROM    dbo.Orders
  WHERE   LTRIM(RTRIM(Status)) = 'Delivered';

ClusteredIndexScan

Now, that’s running as a clustered index scan because the predicate’s not SARGable and besides, there’s no index on that column, but let’s pretend we don’t know that.

If I add a WHERE clause predicate that filters no row out, can I get a query plan with an index seek?

SELECT  OrderDate,
    ShipmentDate,
    Status
  FROM    dbo.Orders
  WHERE   LTRIM(RTRIM(Status)) = 'Delivered'
    AND OrderID > 0;

Why yes, I can.

ClusteredIndexSeek

Op Success? Well…

The goal of performance tuning is to improve the performance of a query, not to change operators in a query plan. The plan is a tool, not a goal.

Have we, by adding a WHERE clause predicate that filters out no rows, improved performance of the query? This needs an extended events session to answer. Nothing fancy, just a sql_statement_completed event will do the trick.

I ran each query 10 times, copied the captured events into Excel and averaged them:

Query with just the LTRIM(RTRIM(Status)) = ‘Delivered’
CPU: 77ms
Duration: 543ms

Query with LTRIM(RTRIM(Status)) = ‘Delivered’ AND OrderID > 0
CPU: 80ms
Duration: 550ms

We haven’t tuned that query. I won’t say we’ve made it slower either, the differences are well within the error range on our measuring, but there’s definitely no meaningful performance gain.

There’s no gain because we haven’t changed how the query executes. A scan, and in this case it will be a scan of the entire index, will likely use the non-leaf levels of the b-tree to locate the logical first page of the leaf level, then will read the entire leaf level. The seek we managed to generate will use the b-tree to find the value 0 in the clustered index key, that’s what makes it a seek. Since the column is an identity starting at 1, that means the first row read will be on the logical first page of the leaf level, then it will read the entire leaf level.

Both will do the same amount of work, and so we haven’t done anything useful to the query by adding a WHERE clause that filters out no rows.

Scans are not always bad. If a query needs to read every row of a table, that’s a scan and effort shouldn’t be expended trying to make it an index seek.

To improve the performance of a query, we need to make changes that reduce the work needed to run the query. That often starts with reducing the amount of data that the query reads, by changing the query so that it can use indexes effectively and/or adding indexes to support the query. Not by adding pointless pieces to a query just to change plan operators from ones that are believed to be bad to ones that are believed to be good. Doing that is just a waste of time and effort.

Does an index scan always read the entire index?

No.

That’s a bit short for a blog post, so let me explain. First, the difference between a seek and a scan.

A seek is an operation which navigates down the index’s b-tree looking for a row or for the start/end of a range of rows. A seek requires a predicate and that predicate must be of the form that can be used as a search argument (SARGable).

A scan is a read of the leaf level of an index, possibly also reading the intermediate pages as well.

The key there is that a seek requires a predicate. If there’s no predicate, we cannot have a seek and hence must have a scan.

Let’s look at a couple of examples. I’m going to use a simple Numbers table as it’s perfectly adequate for what we’re doing here.

CREATE TABLE Numbers (
  Number INT NOT NULL PRIMARY KEY CLUSTERED
);

INSERT INTO Numbers (Number)
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))
  FROM msdb.sys.columns c1 CROSS JOIN msdb.sys.columns c2;

With the table created, I want to look at how many pages that clustered index has. It won’t be a huge number, the table is very small.

SELECT OBJECT_NAME(object_id) AS TableName, index_level, page_count, record_count, avg_record_size_in_bytes
  FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Numbers'),1,NULL, 'Detailed');

IndexPages

Three level deep clustered index, 1608 pages at the leaf and a total of 1614 pages in all levels.

Let’s start with something basic:

SET STATISTICS IO ON;
GO

SELECT * FROM Numbers;

FullTableScan

Table ‘Numbers’. Scan count 1, logical reads 1615, physical reads 0

That read the entire index, every page at every level. The extra page was likely the IAM page for that index. That’s kinda what we expect a scan to be.

For contrast, let’s try an obvious seek.

SELECT * FROM Numbers WHERE Number = 137;

Seek

Table ‘Numbers’. Scan count 0, logical reads 3, physical reads 0.

A single-row seek operation does three reads, which makes sense since we have a three-level deep clustered index.

Now, what about this?

SELECT TOP (1) * FROM Numbers;

It can’t be a seek operation, there’s no predicate. The only way this can be implemented is with a scan.

Top1Scan

It is indeed a scan, but did it read the entire table?

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

No. A scan of the entire index is over 1600 pages. This query read three. It’s a scan, but it’s a scan which stopped after reading one page of the leaf (the other two are likely the root and intermediate pages, used to locate the first page in the leaf).

The scan read one row and then stopped, because that’s all that was needed. It did that, because there was a 1 row row-goal added to the query. For more details on row goals, see Paul White’s article on the subject.

There’s other cases where a scan won’t read the entire index leaf level too.

Aggregations, well MIN and MAX of an indexed column:

SELECT MIN(Number) FROM Numbers;

IndexScanMin

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

EXISTS:

IF EXISTS(SELECT 1 FROM Numbers)
SELECT 1;

ExistsScan

Table ‘Numbers’. Scan count 1, logical reads 3, physical reads 0.

In conclusion, a seek operation requires a predicate. Without a predicate, a query has to be evaluated with a scan, but a scan doesn’t always mean that the entire table is read.

When naming transactions causes an error

For the last part of the series on transactions, I’m going to look at a problem that I ran across entirely by chance while doing some performance tuning. A procedure had a transaction in it, and it was named. Something like this:

CREATE PROCEDURE RemoveUser (@UserID INT)
AS

BEGIN TRY
  BEGIN TRANSACTION Important
    DELETE FROM Users WHERE UserID = @UserID
  COMMIT TRANSACTION Important
END TRY
BEGIN CATCH
  ROLLBACK TRANSACTION Important
END CATCH
GO

The error’s squelched, not handled, but let’s ignore that, it’s not the point of this post. The example as written doesn’t need a transaction, but the real procedure was a tad more complex. The transaction is named, and the name is specified in the COMMIT and ROLLBACK as well.

Before I go into the problem with this code, I’m going to quote from three Books Online entries.

BEGIN TRANSACTION

transaction_name

Is the name assigned to the transaction. transaction_name must conform to the rules for identifiers, but identifiers longer than 32 characters are not allowed. Use transaction names only on the outermost pair of nested BEGIN…COMMIT or BEGIN…ROLLBACK statements. transaction_name is always case sensitive, even when the instance of SQL Server is not case sensitive.

COMMIT TRANSACTION

transaction_name

Is ignored by the SQL Server Database Engine. transaction_name specifies a transaction name assigned by a previous BEGIN TRANSACTION. transaction_name must conform to the rules for identifiers, but cannot exceed 32 characters. transaction_name can be used as a readability aid by indicating to programmers which nested BEGIN TRANSACTION the COMMIT TRANSACTION is associated with.

ROLLBACK TRANSACTION

transaction_name

Is the name assigned to the transaction on BEGIN TRANSACTION. transaction_name must conform to the rules for identifiers, but only the first 32 characters of the transaction name are used. When nesting transactions, transaction_name must be the name from the outermost BEGIN TRANSACTION statement. transaction_name is always case sensitive, even when the instance of SQL Server is not case sensitive.

I do love how Books Online refers to nested transactions as though they were real things…

Now that we’re perused the documentation, let’s go back to the code. As-is it runs fine, however let’s try that procedure called from within another transaction.

BEGIN TRANSACTION
  EXEC dbo.RemovePosts @UserID = 152;
  EXEC dbo.RemoveThreads @UserID = 152;
  EXEC dbo.RemoveUser @UserID = 152;
COMMIT

Some sort of ‘remove all my details’ functionality, except that I’ve ‘forgotten’ to check one foreign key relationship and so the delete inside the inner transaction is going to throw an error. Execution is going to be transferred to the CATCH block and the ROLLBACK TRANSACTION is going to run.

Msg 6401, Level 16, State 1, Procedure RemoveUser, Line 10
Cannot roll back Important. No transaction or savepoint of that name was found.

Whoops.

The documentation for ROLLBACK TRANSACTION states ‘When nesting transactions, transaction_name must be the name from the outermost BEGIN TRANSACTION statement.’. In this case a second transaction has been wrapped around the first, the outer transaction had no name and hence an error occurred.

No big deal though, right? It’s just a different error being thrown, right? It’s not as if we were handling and logging the error properly in the first place.

Not quite.

OpenTransaction

The error was thrown by the ROLLBACK statement. As such, the transaction is still open, the locks are held and the transaction log space can’t be reused. Unless the application that called this was checking for open transactions, that transaction could potentially be left open for quite some amount of time, causing blocking and/or the transaction log to grow.

It’s not just that someone in the future might call the code from another stored proc within a transaction, it’s also that it might be that the code is called from an application which started a transaction. Or called from SSIS which started a transaction. It’s very hard to ensure that code is never called from within an existing transaction

In conclusion, if you want to put a name on a BEGIN TRANSACTION, for documentation purposes or to help with debugging, go ahead. If you want to put a name on a COMMIT, that’s fine too, although it has no effect at all on how the code runs. Don’t put names on the ROLLBACK ever, as it leaves the code with a potential hidden bug.

This concludes the short series on transactions. The full list is:

  1. A Mess of Nested Transactions
  2. Savepoints and conditional transactions
  3. Why would you want to name a transaction?
  4. When naming transactions causes an error (this post)

Why would you want to name a transaction?

Something that i keep seeing in documentation, in forum code and in real systems is transactions that have names

BEGIN TRANSACTION VeryImportantProcess

<do stuff>

COMMIT VeryImportantProcess

Now the question came up on the forums a while back as to what the point of this is. Why would you name a transaction, what effect does it have on behaviour and how is it treated internally?

In the first post in this series I had a look at nested transactions and in that post we saw that nested transactions don’t have any effect on how the transactions are treated. The BEGIN TRANSACTION statement isn’t even logged if there’s already an open transaction in that session and a ROLLBACK still rolls everything back (unless it’s specifically rolling back to a savepoint)

First question then regarding naming of transactions is whether it has any effect at all on that behaviour.

CREATE TABLE TestTrans (
ID int,
InsertDate DATETIME
);

BEGIN TRANSACTION OuterTran
  INSERT INTO TestTrans (ID, InsertDate) VALUES (1,GETDATE())
  BEGIN TRANSACTION InnerTran
    INSERT INTO TestTrans (ID, InsertDate) VALUES (2,GETDATE())
  COMMIT TRANSACTION InnerTran
ROLLBACK TRANSACTION OuterTran

SELECT * FROM TestTrans

NoRecords

Clearly none. Even with all the transactions named, a commit of a named transaction did not commit anything until the transaction nest level reached 0. A rollback still rolls back everything.

Ok, so maybe there’s some internal usage for a transaction’s name. Let’s have a look at the transaction log to see.

CHECKPOINT;

BEGIN TRANSACTION OuterTran
  INSERT INTO TestTrans (ID, InsertDate) VALUES (1,GETDATE())
  BEGIN TRANSACTION InnerTran
    INSERT INTO TestTrans (ID, InsertDate) VALUES (2,GETDATE())
  COMMIT TRANSACTION InnerTran
ROLLBACK TRANSACTION OuterTran

SELECT * FROM fn_DBLog(NULL, NULL)

TranLogContents

The outer transaction’s name appears in the log, replacing what was ‘user_transaction’ previously, but other than that there’s no differences. The inner transaction still doesn’t appear anywhere in the log, neither does the inner commit, and the rollback doesn’t reference the transaction name anywhere.

So what conclusion can we come to here? Pretty much that naming of transactions has one real use, as a form of documentation. Instead of putting a comment above an BEGIN TRANSACTION we can give the transaction a name that indicates what the transaction does, That’s about the only real use.

With two exceptions.

There’s a seldom used option on a BEGIN TRANACTION, an optional WITH clause, Using that along with the transaction name can have some uses in exceptional circumstances.

BEGIN TRANSACTION VeryImportantProcess WITH MARK

<do stuff>

COMMIT TRANSACTION VeryImportantProcess

Now if we run an example of that and look into the log, there’s something different there. Interestingly, the difference isn’t on the BEGIN_XACT, it’s on the COMMIT_XACT log record.

2015_10_05_15_55_36_SQLQuery1.sql_EXCALIBUR.TransactionDemo_Excalibur_Gail_54_Microsoft_SQL_

The question still stands though, why do this? Why name a transaction and put a mark of that into the transaction log. What  benefit does it give?

The mark can be used with a rather poorly known and seldom used pair of options on the restore command (when restoring transaction log backups)

  • WITH STOPAFTERMARK
  • WITH STOPBEFOREMARK

These two, similar to the better known STOPAT can be used to stop a transaction log restore part way through a log backup. Unlike STOPAT, they don’t take a time, but rather take a string, a name that was used for a marked transaction name.

Now to be honest there’s not all that much use for this. If a user drops a table or otherwise messes data up requiring a restore to the point before that happened they’re not going to be nice and run that accidental data mess up in a named, marked transaction.

Where it can be useful is when the transaction crossed multiple databases and/or multiple servers. The log mark appears in all of them and can therefore be used to restore multiple databases to a consistent state where that consistent state is not a single point in time.

Still, it’s useful to have in the toolbox just in case some circumstance comes up where it’s needed.

Edit: It was pointed out, both on twitter and in the comments, that adding a name to a transaction reflects in the transaction-related DMVs and potentially makes debugging easier. Absolutely, it can do and it’s a valid reason to name transactions, I’m lumping that in with ‘documentation’ in the discussions here.

There’s another case where naming transactions can affect behaviour and in this case usually in a bad way. I’ll look at in in more detail in part 4 of this series, but for now will just reference the relevant segment in Books Online:

transaction_name Is the name assigned to the transaction on BEGIN TRANSACTION. transaction_name must conform to the rules for identifiers, but only the first 32 characters of the transaction name are used. When nesting transactions, transaction_name must be the name from the outermost BEGIN TRANSACTION statement. transaction_name is always case sensitive, even when the instance of SQL Server is not case sensitive.

The full list is:

  1. A Mess of Nested Transactions
  2. Savepoints and conditional transactions
  3. Why would you want to name a transaction? (This post)
  4. When naming transactions causes an error

Savepoints and conditional transactions

This is the second in a short series on transactions. In the previous part I discussed nested transactions and showed that they actually don’t exist. SQL has syntax which may suggest that nested transactions exist, but it’s just a syntactical lie.

One thing I suggested in that post was that transactions should be started and committed (or rolled back) at one level only.

That’s a nice idea, but it often doesn’t work when dealing with existing systems which may have been developed with less attention to transactions than ideal. What happens when a stored procedure needs to run in a transaction and potentially roll back its changes, but can be called either within an existing transaction or independently?

The answer to that comes in two parts, first the conditional starting of a transaction and secondly save points.

I should mention that this is fairly advanced transaction management, it’s not something I see in production code very often. It should be considered carefully before being used, as if the people supporting the code don’t understand this, there could be trouble.

Let’s start with some setup. Some random tables and a stored procedure which inserts into parent and child tables.

CREATE TABLE Parent (
    SomeID INT IDENTITY PRIMARY KEY,
    SomeRandomValue VARCHAR(50),
    NotificationRequired BIT DEFAULT 0
);

CREATE TABLE Child (
    SomeOtherID INT IDENTITY PRIMARY KEY,
    ParentID INT,
    State INT,
    SomeValue INT
);

CREATE TABLE Notifications (
    ParentID INT,
    NotificationText VARCHAR(1000),
    NotificationDate DATETIME DEFAULT GETDATE()
);

Let’s say an outer procedure which may insert into the parent table, inserts into the child table and then calls another procedure. The other procedure inserts a notification into a table.

CREATE PROCEDURE OuterProc (@SomeValue VARCHAR(50), @SomeOtherValue INT)
AS

DECLARE @ParentID INT;

BEGIN TRY
    BEGIN TRANSACTION

    IF NOT EXISTS ( SELECT  1
                    FROM    dbo.Parent
                    WHERE   SomeRandomValue = @SomeValue )
        BEGIN
            INSERT  INTO dbo.Parent
                    (SomeRandomValue)
            VALUES  (@SomeValue);

            SELECT  @ParentID = @@IDENTITY;
        END
    ELSE
        SELECT  @ParentID = SomeID
        FROM    dbo.Parent
        WHERE   SomeRandomValue = @SomeValue;

    INSERT  INTO dbo.Child
            (ParentID, SomeValue)
    VALUES  (@ParentID, @SomeOtherValue);

    UPDATE  dbo.Parent
    SET     NotificationRequired = 1
    WHERE   SomeID = @ParentID;

    EXEC NotificationProc @ParentID;

    COMMIT TRANSACTION
END TRY
BEGIN CATCH
    ROLLBACK TRANSACTION;
-- other error handling here
END CATCH
GO

There are some potential race conditions in there, but let’s ignore those for now, it’s not the point of this post.

The requirement for the notification procedure is that if it throws an error all of its changes must be undone however the changes in the outer procedure must still commit. The notification procedure can be called independently or from within the OuterProc

The starting code for the notification procedure is:

CREATE PROCEDURE NotificationProc (@AnID INT)
AS
    UPDATE  dbo.Parent
    SET     NotificationRequired = 0
    WHERE   SomeID = @AnID

    INSERT  INTO dbo.Notifications
            (ParentID,
             NotificationText,
             NotificationDate
            )
            SELECT  SomeID,
                    SomeRandomValue,
                    GETDATE()
            FROM    dbo.Parent
            WHERE   SomeID = @AnID;

GO

This currently has no transaction management at all. Let’s start by implementing the first requirement, if called from outside of any transaction, it must begin a transaction and either commit or rollback all changes.

To do this, we can check the value of @@Trancount to see whether a transaction should be started or not.

CREATE PROCEDURE NotificationProc (@AnID INT)
AS
    DECLARE @Independent BIT = 0;
    IF @@TRANCOUNT = 0  -- proc called from outside of any transaction
        SET @Independent = 1;

    BEGIN TRY
        IF @Independent = 1
            BEGIN TRANSACTION;

        UPDATE  dbo.Parent
        SET     NotificationRequired = 0
        WHERE   SomeID = @AnID;

        INSERT  INTO dbo.Notifications
                (ParentID,
                 NotificationText,
                 NotificationDate
                )
                SELECT  SomeID,
                        SomeRandomValue,
                        GETDATE()
                FROM    dbo.Parent
                WHERE   SomeID = @AnID;

        IF @Independent = 1
            COMMIT TRANSACTION;

    END TRY
    BEGIN CATCH

        IF @Independent = 1
            ROLLBACK TRANSACTION;

        THROW;

    END CATCH;

GO

The THROW is there so that any error can be passed up to the caller to be logged or otherwise handled. It also ensures that it fires any catch block in the calling code.

That’s the first half, the transaction handling if called independently, but it doesn’t help with the second requirement, that the changes made in the notification procedure roll back if there’s an error, but leave the changes made in the calling procedure unaffected and the transaction open. To do that, we need a savepoint.

From Books Online:

A user can set a savepoint, or marker, within a transaction. The savepoint defines a location to which a transaction can return if part of the transaction is conditionally canceled. If a transaction is rolled back to a savepoint, it must proceed to completion with more Transact-SQL statements if needed and a COMMIT TRANSACTION statement, or it must be canceled altogether by rolling the transaction back to its beginning.

It’s not a nested transaction, it’s not like Oracle’s autonomous transactions, it’s just a point within the transaction to which we can roll back to. It lets us undo the most recent part of a transaction.

ALTER PROCEDURE NotificationProc (@AnID INT)
AS
    DECLARE @Independent BIT = 0;
    IF @@TRANCOUNT = 0  -- proc is being called from outside of any transaction
        SET @Independent = 1;

    BEGIN TRY
        IF @Independent = 1
            BEGIN TRANSACTION;
        ELSE
            SAVE TRANSACTION Notifications; -- Define a savepoint which we can roll the transaction back to.

        UPDATE  dbo.Parent
        SET     NotificationRequired = 0
        WHERE   SomeID = @AnID;

        INSERT  INTO dbo.Notifications
                (ParentID,
                 NotificationText,
                 NotificationDate
                )
                SELECT  SomeID,
                        SomeRandomValue,
                        GETDATE()
                FROM    dbo.Parent
                WHERE   SomeID = @AnID;

        IF @Independent = 1
            COMMIT TRANSACTION;

    END TRY
    BEGIN CATCH

        IF @Independent = 1
            ROLLBACK TRANSACTION; -- roll back the entire transaction
        ELSE
            ROLLBACK TRANSACTION Notifications; -- roll back to the save point

        SELECT  ERROR_NUMBER(),
                ERROR_MESSAGE(),
                ERROR_LINE(); -- In reality, log this somewhere

    END CATCH;

GO

Looks good, let’s test. To test, I’m going to add a check constraint to the Notifications table that will be violated by the insert in the Notification procedure. This is to simulate the kind of errors that can happen in a real system (key violations, data type conversion failures, check constraint violations, etc). I’ll also add some sample data to the Parent table.

ALTER TABLE Notifications ADD CONSTRAINT Testing CHECK (NotificationDate > '2020-01-01')

INSERT INTO dbo.Parent (SomeRandomValue, NotificationRequired)
VALUES  ('abc',0), ('def',1),('ghi',0),('jkl',1),('mno',1);

First test, outside of a transaction. The requirement is that neither of the changes in the proc remain after the execution (because of the error)

SELECT SomeID, SomeRandomValue, NotificationRequired FROM dbo.Parent WHERE SomeID = 2;

EXEC dbo.NotificationProc @AnID = 2;

SELECT SomeID, SomeRandomValue, NotificationRequired FROM dbo.Parent WHERE SomeID = 2;
SELECT * FROM dbo.Notifications WHERE ParentID = 2;

IndependentTransaction

Success. When run independently the two data modifications were run in a transaction and when the error occurred, were rolled back.

Now let’s try from within an existing transaction. The changes made in the outer procedure (insert of a row into Child and update NotificationRequired to 1 in Parent must commit, but the changes made in the inner proc must not)

SELECT SomeID, SomeRandomValue, NotificationRequired FROM dbo.Parent WHERE SomeID = 1;

EXEC dbo.OuterProc @SomeValue = 'abc', @SomeOtherValue = 7;

SELECT SomeID, SomeRandomValue, NotificationRequired FROM dbo.Parent WHERE SomeID = 1;
SELECT SomeOtherID, ParentID, SomeValue FROM dbo.Child WHERE ParentID = 1;
SELECT * FROM dbo.Notifications WHERE ParentID = 1;

NestedTransaction

Again, exactly the desired behaviour. The changes made in the outer procedure were committed, the changes in the inner procedure, the procedure where the error was thrown, were rolled back.

Used correctly, savepoints can be a powerful mechanism for managing transactions in SQL Server. Unfortunately they’re not well known and as such their use can also make code much harder for later developers to debug.

The next part of the series on transactions will look at why (and why not) you’d want to name a transaction.

The full list is:

  1. A Mess of Nested Transactions
  2. Savepoints and conditional transactions (This post)
  3. Why would you want to name a transaction?
  4. When naming transactions causes an error

A Mess of Nested Transactions

Transactions are an area that I often find are used badly, or not used at all. Transactions without any error handling, nested transactions, named transactions named for no good reason or just plain lack of transactions where they should be used.

This is going to be the first of three posts looking at some aspects of transactions and transaction management. And deserving of the first post is that most misunderstood of things, the nested transaction.

Before getting into the details, I need to make one thing clear. Nested transactions are a lie. They do not exist in SQL Server.

A piece of code may appear to have multiple nested transactions in it. It may have multiple BEGIN TRANSACTION statement in it. It only has one transaction though.

An easy way to show this would be to look at the transaction log. Since transactions are there to make multiple statements atomic (committed or rolled back as a unit), the transactions must be reflected in the transaction log to allow SQL to know where to roll things back to or what statements need to be undone/redone as a unit.

CREATE TABLE TestingTransactions (
SomeNumber INT,
SomeString VARCHAR(50)
);
GO

INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
VALUES  (0, 'Zero');

CHECKPOINT; -- mark the log as reusable

BEGIN TRANSACTION

	INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
	VALUES  (1, 'One');

	BEGIN TRANSACTION

		INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
		VALUES  (2, 'Two');

		BEGIN TRANSACTION

			INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
			VALUES  (3, 'Three');

		COMMIT TRANSACTION

	COMMIT TRANSACTION

COMMIT TRANSACTION

SELECT Operation, Context, [Transaction ID], AllocUnitName, [Transaction Name]
FROM fn_dblog(NULL, NULL);

GO

I’m inserting a row before the transaction because allocating the first page to a table generates a lot of log entries. Those log records are not relevant to the subject at hand and will just confuse things.

One table, three transactions, three inserts, three commits. Let’s see what went into the transaction log.

TransactionLogNestedTransactions

What went into the log was a single begin transaction, a single commit and three inserts. The other two begin transaction statements, and the other two commit transaction statements were completely ignored for the purposes of the transaction log.

The other two begin transaction statements did do something. Not very much, and nothing that reflected in the transaction log, but they did do one thing, and if I make the code a little more complicated it’ll become clear what that one thing is.

CHECKPOINT; -- mark the log as reusable

BEGIN TRANSACTION

	INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
	OUTPUT inserted.*
	VALUES  (1, 'One');
	SELECT @@TRANCOUNT AS TransactionCountAfterBeginTran1;

	BEGIN TRANSACTION

		INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
		OUTPUT inserted.*
		VALUES  (2, 'Two');
		SELECT @@TRANCOUNT AS TransactionCountAfterBeginTran2;

		BEGIN TRANSACTION

			INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
			OUTPUT inserted.*
			VALUES  (3, 'Three');
			SELECT @@TRANCOUNT AS TransactionCountAfterBeginTran3;

		COMMIT TRANSACTION
		SELECT @@TRANCOUNT AS TransactionCountAfterCommitTran1;

	COMMIT TRANSACTION
	SELECT @@TRANCOUNT AS TransactionCountAfterCommitTran2;

COMMIT TRANSACTION
SELECT @@TRANCOUNT AS TransactionCountAfterCommitTran3;

TranCount

Each BEGIN TRANSACTION incremented the value returned by @@TranCount, each COMMIT TRANSACTION decremented the value returned by @@TranCount. If I looked at the transaction log after each statement, the BEGIN_XACT log entry corresponds to the BEGIN TRANSACTION which incremented @@Trancount from 0 to 1, and the COMMIT_XACT log entry corresponds to the COMMIT TRANSACTION which decremented @@Trancount from 1 to 0.

Only the operations which changed @@Trancount to or from 0 are reflected in the transaction log, they’re the only ones which have any effect on what becomes part of the database’s durable state.

Does ROLLBACK then follow the same rules? Does it get ignored unless it’s the last statement of multiple COMMIT/ROLLBACK statements?

CHECKPOINT; -- mark the log as reusable

BEGIN TRANSACTION

	INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
	OUTPUT inserted.*
	VALUES  (1, 'One');
	SELECT @@TRANCOUNT AS TransactionCountAfterBeginTran1;

	BEGIN TRANSACTION

		INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
		OUTPUT inserted.*
		VALUES  (2, 'Two')
		SELECT @@TRANCOUNT AS TransactionCountAfterBeginTran2;

		BEGIN TRANSACTION

			INSERT INTO dbo.TestingTransactions (SomeNumber, SomeString)
			OUTPUT inserted.*
			VALUES  (3, 'Three')
			SELECT @@TRANCOUNT AS TransactionCountAfterBeginTran3;

		ROLLBACK TRANSACTION
		SELECT @@TRANCOUNT AS TransactionCountAfterRollbackTran1;

	COMMIT TRANSACTION
	SELECT @@TRANCOUNT AS TransactionCountAfterCommitTran2;

COMMIT TRANSACTION
SELECT @@TRANCOUNT AS TransactionCountAfterCommitTran3;

SELECT Operation, Context, [Transaction ID], AllocUnitName, [Transaction Name]
FROM fn_dblog(NULL, NULL);

 

RollbackTransaction

TransactionLogRollbackTransaction

No. The ROLLBACK TRANSACTION set the value of @@Trancount directly to 0 and undid all of the inserts back to the outermost BEGIN TRANSACTION. The COMMIT TRANSACTION statements which ran after it all threw errors

Msg 3902, Level 16, State 1, Line 39
The COMMIT TRANSACTION request has no corresponding BEGIN TRANSACTION.

This is the danger of nesting transactions, that one ROLLBACK, anywhere after the first BEGIN TRANSACTION will undo everything right the way back to that first BEGIN TRANSACTION. These are not autonomous transactions being run one inside another, there’s a single transaction and rollback anywhere will roll back everything.

This can be a debugging nightmare when the application creates a transaction and calls a procedure which begins a transaction which calls other procedures which begin and commit transactions. If something goes wrong in one procedure and that procedure rolls the transaction back, it can end up undoing a lot more work than the developer of that procedure may have expected.

In general I would recommend only starting and committing transactions at a single level, and making transaction handling part of the application’s design, not something that falls out of the implementation by accident. If there’s an intentional design around how transactions will be handled (and choosing not to have any is not a good idea), then it’s a lot less likely that there will be transaction-related bugs that need to be found and fixed.

And what happens when working with an existing app where this wasn’t done? I’ll look at that in the next post on transactions, on conditionally starting transactions and on the use of savepoints.

The full list is:

  1. A Mess of Nested Transactions (This post)
  2. Savepoints and conditional transactions
  3. Why would you want to name a transaction?
  4. When naming transactions causes an error

Goodbye IsNumeric hell

A well overdue feature introduced in Denali CTP 3 is that of the Try_Parse and Try_Convert functions. These are great for dealing with the something that’s frustrated SQL developers for years – data type conversions.

Let’s imagine a rather nasty case, a file with some values in it that once imported into SQL (as character data) looks something like this:

BadNumerics

Ewww… that’s a mess. Let’s see if we can identify which of the values can be converted into a numeric data type. The function prior to Denali for that was ISNUMERIC.

SELECT ANumber, ISNUMERIC(ANumber)
FROM BadNumerics   

IsNumeric

Great, so other than the obvious one, they’re all numeric data. Time to get converting.

SELECT CAST(ANumber as Numeric(18,6))
  FROM BadNumerics
  WHERE ISNUMERIC(ANumber) = 1;

Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to numeric.

Err, so they’re numeric but can’t be converted to numeric. That’s fun. Maybe another of the numeric data types will work…

(more…)

Converting OR to Union

When I looked at indexing for queries containing predicates combined with OR, it became clear that the are some restrictive requirements for indexes for the optimiser to consider using the indexes for seek operations.

  • Each predicate (or set of predicates) combined with an OR must have a separate index
  • All of those indexes must be covering, or the row count of the concatenated result set low enough to make key lookups an option, as the optimiser does not apparent to consider the possibility of doing key lookups for a subset of the predicates before concatenating the result sets.

So what can be done if it’s not possible to meet those requirements?

The standard trick is to convert the query with ORs into multiple queries combined with UNION. The idea is that since OR predicates are evaluated separately and the result sets concatenated, we can do that manually by writing the queries separately and concatenating them using UNION or UNION ALL. (UNION ALL can only be safely used if the predicates are known to be mutually exclusive)

CREATE TABLE Persons (
PersonID INT IDENTITY PRIMARY KEY,
FirstName    VARCHAR(30),
Surname VARCHAR(30),
Country CHAR(3),
RegistrationDate DATE
)

CREATE INDEX idx_Persons_FirstName ON dbo.Persons (FirstName) INCLUDE (Surname)
CREATE INDEX idx_Persons_Surname ON dbo.Persons (Surname) INCLUDE (FirstName)
GO

-- Data population using SQLDataGenerator

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Daniel' OR Surname = 'Barnes'

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Daniel'
UNION
SELECT FirstName, Surname
FROM dbo.Persons
WHERE Surname = 'Barnes'

In this case, the OR can be replaced with a UNION and the results are the same. The Union form is slightly less efficient according to the execution plan’s costings (60% compared to the OR at 40%), and the two queries have the same general form, with two index seeks and some form of concatenation and remove duplicates.

OrResult1
OrExecPlan1

So in that case it worked fine, although the original form was a little more efficient
(more…)

On Transactions, errors and rollbacks

Do errors encountered within a transaction result in a rollback?

It seems, at first, to be a simple question with an obvious answer. Transactions are supposed to be atomic, either the entire transaction completes or none of it completes.

Maybe too simple…

CREATE TABLE TestingTransactionRollbacks (
 ID INT NOT NULL PRIMARY KEY ,
 SomeDate DATETIME DEFAULT GETDATE()
 ) ;
GO
BEGIN TRANSACTION
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (1)
-- Fails. Cannot insert null into a non-null column
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (NULL)
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
-- fails. Duplicate key
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (3)
COMMIT TRANSACTION
GO
SELECT ID, SomeDate FROM TestingTransactionRollbacks
GO
DROP TABLE TestingTransactionRollbacks

If a transaction rolled back at the first failure, that final select would return no rows. But it doesn’t, it returns 3 rows. The failure of the individual statements was ignored and the transaction completed and committed. If that had been an important business process, not a made-up example, that could have some nasty consequences for transactional consistency of data.

What’s really going on here? Aren’t transactions supposed to be atomic? Isn’t SQL supposed to roll them back if they don’t complete successfully?

Well, kinda.

(more…)

A Trio of Table Variables

So, it’s the second Tuesday of the month again, and it’s time for T-SQL Tuesday again. TSQL2sDay150x150This month it’s hosted by Sankar Reddy and the topic is “Misconceptions in SQL Server

I thought I’d tackle a trio of table variable myths and partial truths.

Table Variables are memory-only

This one is pervasive and irritating. It typically goes like this:

You should use table variables rather than temp tables because table variables are memory only.

This myth can be broken down into two parts:

  1. That table variables are not part of TempDB
  2. That table variables are not written to disk

The first is easy to prove and has been done repeatedly. I’m not doing it again. I’m going to tackle the second portion only.

See, one could argue that, even though the table variable is created in the TempDB system tables and allocated pages within the TempDB data file, it is still kept entirely and only in memory. Let’s see if that’s true…

DECLARE @LargeTable TABLE (
id INT IDENTITY PRIMARY KEY,
LargeStringColumn1 CHAR(100),
LargeStringColumn2 CHAR(100)
)

INSERT INTO @LargeTable (LargeStringColumn1, LargeStringColumn2)
SELECT TOP (100000) 'Table Variable Test','T-SQL Tuesday!'
FROM master.sys.columns a CROSS JOIN master.sys.columns b

WAITFOR DELAY '00:01:00' -- so that the table var doesn't go out of scope and get deallocated too quickly.

This is not a massively large table. 100000 rows at 204 bytes per row (excluding header). A query of sys.dm_db_index_physical_stats (which does work on temp tables and table variables) reveals a total page count of 2632. That’s a grand total of 20.6 MB. 20 Megabytes. The SQL instance I’m running this on is allowed to use up to 2 GB of memory. No way on earth is this table variable going to cause any form of memory pressure (and I promise there is nothing else running)

So, run that code and, while that waitfor is running, do something that should never be done to a SQL server that you care anything about. (more…)