SQL Server

Compiles and recompiles

I want to spend some time over the next few months looking at query compilation and the plan cache, and there’s a couple concepts that I want to get cleared up beforehand. The first of those is around two terms that are often used interchangeably, compile and recompile.

Compile

A compile occurs when a query is given to the query optimiser and, when it does a lookup into the plan cache, no matching plan is found. The optimism must then compile the query, generating an execution plan, must add that plan to the plan cache (in most cases) and then must pass that plan onto the query execution engine so that the query can be executed. (http://technet.microsoft.com/en-us/library/Cc966425)

Recompile

A recompile is something slightly different. For a recompile, the optimiser must find a matching plan when it queries the plan cache, must hand that cached plan over to the query execution engine and then while doing validation checks the execution engine must determine that then query plan is no longer valid and request the optimiser to partially or completely recompile the query. (http://technet.microsoft.com/en-us/library/Cc966425)

Subtle difference. Both cases result in the optimiser generating an execution plan, but the reasons can be different. Also worth noting is that a compile results in a new plan in the cache, a recompile simply replaces an existing plan.

Another difference since SQL 2005 – a compile is for the entire batch, but a recompile can be for just a single statement within the batch.


Now the theory’s dealt with, let’s look at some examples and see how we can track these two events and try and get a better understanding of which occurs when and how they look.

The tools I’m going to use to track these are performance monitor with the compiles/sec and recompiles/sec counters and SQL Profiler with the event SP:StmtRecompile event (there’s no profiler event for compilation). I’ll also check what’s in the plan cache after each test.

The first one’s going to be very simplistic, a query run against an empty plan cache.

DBCC FREEPROCCACHE
GO

EXEC dbo.OutStandingTotalByStatus
GO

What we get from that is a non-zero value for SQL Compilations/sec (perfmon) and the following from profiler (The SQL Recompiles/sec remains 0)

Compiles1

and the plan cache now contains one plan with one use. (for more info on how the CacheMiss and CacheInsert events work, see http://www.sqlinthewild.co.za/index.php/2010/07/27/hit-and-miss/ and http://www.sqlinthewild.co.za/index.php/2010/08/31/come-and-gone/)

Compiles2

In this case, I hope it was clear, we had a compile occur (empty plan cache before, new plan added to cache).

Now what happens if, with no clearing of the cache nor anything else being done, I mark that procedure for recompile and run it again?

(more…)

All about Execution Plans

Coming to the PASS Summit in October this year? Excellent!

I say excellent, because Grant (blog|twitter) and I are presenting a full-day seminar on the Monday, all about execution plans. Excited already? If not, let me give you a taste of what we’ll be discussing.

Grant will be kicking things off with a look at how the optimiser works. Not 500 level deep internals (that’d take all day by itself), but an overview of how the optimiser turns this

SELECT t.Title, u.UserName, u.UserLevel, t.LastModified, t.LastPoster
FROM Threads t
INNER JOIN Posts p on t.ThreadID = p.ThreadID
INNER JOIN Users u on p.Poster = u.UserID
WHERE t.ThreadID = 42

into this

QueryIntoExecPlan

as well as some DMVs that give a view on what the optimiser is doing.

After that, I’ll be digging into what the plan cache is, how it’s structured and how plans are matched to incoming queries. I’ll also cover how you might go about monitoring SQL’s usage of the cache.

After that we’ll discuss the myriad and varied ways to get execution plans out of SQL and what the various different options for that return and what some of the terms involved really mean (estimated execution plans anyone?).

Once all that’s out there (which will probably take the entire morning) it’ll be onto the fun of reading the execution plans, what those various icons represent and what all their arcane properties are telling you. Maybe, just maybe we’ll also have a look at the raw XML form of the plan, just for fun.

And because theory’s not very useful without something to apply it to, we’ll be ending off the day by seeing how you can use the information from the execution plan to tune your queries. After all, everyone wants better performance, right?

Still not completely convinced? Then check out the 24 hours of PASS coming in September. Both Grant and I are doing a pre-con preview. I’m doing extracting execution plans and Grant’s doing a portion of the reading execution plans section.

Right, now that everyone’s dying to attend this, I’m going to toss out a little bit of a challenge. If you’re coming to our precon and you have an execution plan that you don’t understand (maybe a strange operator, maybe a property with a value you’ve never seen), mail it to me ([email protected]<blog domain>) with a note as to what you want clarifying and we might discuss it during the pre-con. Some rules, just to keep things sane

  • No confidential stuff. I don’t want legal trouble, neither do you.
  • If the exec plan takes 5 minutes to open in SSMS, I’m deleting it.
  • If it’s multiple pages in SSMS, I’m deleting it.
  • I don’t promise to look at anyone’s plans, it depends how we’re doing on time.

Goodbye IsNumeric hell

A well overdue feature introduced in Denali CTP 3 is that of the Try_Parse and Try_Convert functions. These are great for dealing with the something that’s frustrated SQL developers for years – data type conversions.

Let’s imagine a rather nasty case, a file with some values in it that once imported into SQL (as character data) looks something like this:

BadNumerics

Ewww… that’s a mess. Let’s see if we can identify which of the values can be converted into a numeric data type. The function prior to Denali for that was ISNUMERIC.

SELECT ANumber, ISNUMERIC(ANumber)
FROM BadNumerics   

IsNumeric

Great, so other than the obvious one, they’re all numeric data. Time to get converting.

SELECT CAST(ANumber as Numeric(18,6))
  FROM BadNumerics
  WHERE ISNUMERIC(ANumber) = 1;

Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to numeric.

Err, so they’re numeric but can’t be converted to numeric. That’s fun. Maybe another of the numeric data types will work…

(more…)

SQLSkills Immersion training in London

Just short of the winter solstice, I bailed out of a freezing cold Johannesburg for warmer climates; well, actually for London where the weather could almost have been mistaken for a South African winter, except much wetter.

However I wasn’t going to London for the weather (fortunately), nor even for some sightseeing; I was going to London for some SQL training. Not just any SQL training, but some seriously deep training from among the best trainers in the world, one of Paul Randal and Kimberly Tripp’s five day immersion courses, and immersion it most certainly was.

It gives a good feel for the intensity and depth of the course when, after all the introductions and housekeeping clip_image002on the Monday morning were done, Paul started off by diving straight into the structure of pages and rows. Start slowly, where’s the fun in that? By lunchtime I had on the order of 20 pages of notes. The reasoning behind starting with that is that the internal structures come up again and again and again in the later material, so a firm understanding of how things are put together at the lowest level makes it easier to understand the features that are built on top.

The material covered is detailed on the SQLSkills website (http://www.sqlskills.com/T_ImmersionInternalsDesign.asp), so I’m not going clip_image004to waste space by listing it all again. The material was all covered in incredible detail and any point could be and frequently was expanded on if any of the students asked.

On the subject of questions, there was not a single one, over all five days, that Paul and Kimberly could not answer with at most a couple of minutes of research (mostly for things like kb article numbers). Questions and comments were encouraged and often the discussions were as valuable as the main course material.

By the Friday, all the students were looking a little worn out. Paul, of course, was still as vibrant as ever, to the point of heckling (in good fun naturally) someone who crawled in late on the Friday morning.

All in all that was a fantastic experience, and that was just week one out of the four. I’d really like to thank Paul and Kimberly for coming across to London and giving people who can’t make it to the US classes an opportunity to take one of their Immersion Courses. I certainly hope that they plan to make a return visit soon.

clip_image006

Converting OR to Union

When I looked at indexing for queries containing predicates combined with OR, it became clear that the are some restrictive requirements for indexes for the optimiser to consider using the indexes for seek operations.

  • Each predicate (or set of predicates) combined with an OR must have a separate index
  • All of those indexes must be covering, or the row count of the concatenated result set low enough to make key lookups an option, as the optimiser does not apparent to consider the possibility of doing key lookups for a subset of the predicates before concatenating the result sets.

So what can be done if it’s not possible to meet those requirements?

The standard trick is to convert the query with ORs into multiple queries combined with UNION. The idea is that since OR predicates are evaluated separately and the result sets concatenated, we can do that manually by writing the queries separately and concatenating them using UNION or UNION ALL. (UNION ALL can only be safely used if the predicates are known to be mutually exclusive)

CREATE TABLE Persons (
PersonID INT IDENTITY PRIMARY KEY,
FirstName    VARCHAR(30),
Surname VARCHAR(30),
Country CHAR(3),
RegistrationDate DATE
)

CREATE INDEX idx_Persons_FirstName ON dbo.Persons (FirstName) INCLUDE (Surname)
CREATE INDEX idx_Persons_Surname ON dbo.Persons (Surname) INCLUDE (FirstName)
GO

-- Data population using SQLDataGenerator

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Daniel' OR Surname = 'Barnes'

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Daniel'
UNION
SELECT FirstName, Surname
FROM dbo.Persons
WHERE Surname = 'Barnes'

In this case, the OR can be replaced with a UNION and the results are the same. The Union form is slightly less efficient according to the execution plan’s costings (60% compared to the OR at 40%), and the two queries have the same general form, with two index seeks and some form of concatenation and remove duplicates.

OrResult1
OrExecPlan1

So in that case it worked fine, although the original form was a little more efficient
(more…)

Are all updates split into delete-insert?

This should be another quick one.

Earlier I saw a forum post where someone asserted that SQL always executes an update as a delete of the row and then an insert of the row with the new values. Now I need another excuse to play with undocumented features, so let’s see if that’s true (or at least true for all the common cases I try)

First up, a heap, no indexes at all, an update that does not change the size of the row.

CREATE TABLE TestingUpdate1 (
ID INT IDENTITY,
SomeString CHAR(50)
)

INSERT INTO TestingUpdate1 (SomeString)
VALUES
('One'),('Two'),('Three'),('Four'),('Five'),('Six'),('Seven'),('Eight'),('Nine')

CHECKPOINT -- truncate the log, DB is in simple recovery.

UPDATE TestingUpdate1
SET SomeString = 'NotFour'
WHERE ID = 4 -- one row

SELECT Operation, Context, AllocUnitName, [Transaction Name], Description FROM fn_dblog(NULL, NULL) AS TranLog

DROP TABLE TestingUpdate1

UpdateHeapFixedSize

The log operation here is Modify Row. so in this case, the update was done as an in-place update.

Second test, a heap, no indexes at all, and an update that changes the row size (however plenty free space on the page, so no issues with forwarding pointers here)

(more…)

T-SQL Tuesday: Disasters don’t just come in huge

imageSo we’re supposed to talk about disasters we’ve had or discuss disaster recovery technologies. I’m going to take a slightly different approach…

<soapbox>

I’m in the fortunate position of living in a region of the world that’s relatively free of natural disasters. We’re reasonably geologically stable, the nearest fault lines are the Great Rift Valley and somewhere in the Antarctic ocean. We don’t get tornadoes, we’re a long way from the ocean (and Madagascar partially protects the coast from typhoons and tsunamis)

Given that, and looking at the recent events in Japan and Southern USA, local IT managers might be grateful that their critical infrastructure is here, not there. But that is no reason for complacency, no reason to ignore putting a disaster recovery plan in place.

Major huge national disasters, while they attract a whole lot of attention (and rightly so) are probably not the main cause of IT disasters. IT disasters, for the most part, are likely to be caused more by smaller events like these1

  • A drive in the RAID 5 array fails, and the SAN admin switches out the wrong drive.
  • A SAN controller fails and overwrites a LUN or 2 with binary garbage.
  • The server room floor collapses dumping the SAN 2 floors down into a garage, with the server on top of it.
  • A water leak a floor above the server room results in the UPS getting a shower, and the resultant power surge fries the servers’ main boards
  • A developer with far too many permissions truncates an important table on the production server, thinking he was working on the development environment.
  • The mains power fails but the generators don’t come online because their fuel was drained a day earlier in preparation for maintenance.

Events like those (or probably even more mundane events) are the ones that we need to plan for. Relatively minor disaster that can leave business without critical infrastructure or critical data for hours or days.

You need to plan for the small disasters as well as the large ones. Plan for the dropped table. Plan for two drives failing in the RAID 5 array. Plan for the server’s power supply failing. Plan for the big disasters too, just don’t think that they’re the only thing that endangers your data and your business.

(1) I’ve personally seen 3 of those events happen, I’ve heard from people who have seen two more and I know of a company that’s at risk of one. They’re not just made-up improbably occurrences.

On the exorcism of Ghost Records

There was a question earlier on one of the SQL forums as to whether or not Ghost Cleanup overwrote the deleted rows when it ran so as to ensure no one could read that data again.

Now I could just reference Paul Randal‘s blog post where he described Ghost Cleanup in depth and leave it there, but where would the fun be in that? Smile

So, now that everyone’s read Paul’s blog post, time for some fun with undocumented commands to prove (again) what we already know (and what Paul’s already shown).

Setup code:

CREATE TABLE TestingCleanup (
ID INT IDENTITY PRIMARY KEY,
Description VARCHAR(20),
Filler CHAR(50) DEFAULT ''
);
GO

INSERT INTO TestingCleanup (Description)
VALUES ('One'), ('Two'), ('Three'), ('Four'), ('Five'), ('Six'), ('Seven'), ('Eight'), ('Nine'), ('Ten')

-- Find the page number that the table is on

SELECT OBJECT_ID('TestingCleanup') -- 1399012065
DBCC IND(11,1399012065,1)
-- The IAM is 1:309 and the data page is 1:308

Now I’m going to delete half the table and then go and look at the page. The ghost cleanup can’t run until the transaction commits, so running this within a transaction gives a chance to go and look at how things work.

BEGIN TRANSACTION
DELETE FROM TestingCleanup WHERE ID%2 = 0 -- delete the even rows
SELECT ID, Description FROM TestingCleanup -- 5 rows
DBCC TRACEON (3604)
DBCC PAGE(11,1,310,1)
DBCC TRACEOFF (3604)
COMMIT TRANSACTION

The delete removes all the rows with even identity values and the select returns only 5 rows, as expected

Dump type 1 for DBCC Page gives the header and then each row separately in binary. I’m editing out uninteresting parts of the output to keep things manageable.

(more…)

On Transactions, errors and rollbacks

Do errors encountered within a transaction result in a rollback?

It seems, at first, to be a simple question with an obvious answer. Transactions are supposed to be atomic, either the entire transaction completes or none of it completes.

Maybe too simple…

CREATE TABLE TestingTransactionRollbacks (
 ID INT NOT NULL PRIMARY KEY ,
 SomeDate DATETIME DEFAULT GETDATE()
 ) ;
GO
BEGIN TRANSACTION
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (1)
-- Fails. Cannot insert null into a non-null column
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (NULL)
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
-- fails. Duplicate key
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (2)
-- succeeds
INSERT INTO TestingTransactionRollbacks (ID)
 VALUES (3)
COMMIT TRANSACTION
GO
SELECT ID, SomeDate FROM TestingTransactionRollbacks
GO
DROP TABLE TestingTransactionRollbacks

If a transaction rolled back at the first failure, that final select would return no rows. But it doesn’t, it returns 3 rows. The failure of the individual statements was ignored and the transaction completed and committed. If that had been an important business process, not a made-up example, that could have some nasty consequences for transactional consistency of data.

What’s really going on here? Aren’t transactions supposed to be atomic? Isn’t SQL supposed to roll them back if they don’t complete successfully?

Well, kinda.

(more…)

Indexing for ORs

All of the indexing strategy posts I’ve written in the past have been concerned with predicates combined with ANDs. That’s only one half of the possibilities though. There’s the case of predicates combines with ORs, and the guidelines for indexing that work well with ANDs don’t work with ORs

When dealing with predicates combined with AND, the predicates are cumulative, each one operates to further reduce the resultset.

For this reason, multi-column indexes support multiple predicates combined with AND operators.

If we look at a quick example, consider the following.

CREATE TABLE Customers (
  CustomerID INT IDENTITY PRIMARY KEY,
  Surname VARCHAR(30) NOT NULL,
  FirstName VARCHAR(30),
  Title VARCHAR(5),
  CustomerType CHAR(1) NOT NULL,
  IsActive BIT DEFAULT 1 NOT NULL,
  RegistrationDate DATETIME NOT NULL DEFAULT GETDATE()
);</pre>
CREATE INDEX idx_Customers_SurnameFirstName ON Customers (Surname, FirstName);

Again I’m going to be lazy and get SQLDataGenerator to generate a few rows.

With that two column index on those columns and a query that looks for Surname = ‘Kelley’ AND Name = ‘Rick’, SQL can do a double column seek to go directly to the start of the range then just read down the index to the end of the range, basically until it finds the first row that it’s not interested in.

So how does that that differ when the operator is an OR?

The main difference is that with an OR, the predicates are independent. The second doesn’t serve to reduce the recordset, but rather to expand it. It’s similar to evaluating two separate predicates and combining the result. Let’s have a look at that 2 column index again when the two predicates are combined with an OR.

SELECT CustomerID
  FROM Customers
  WHERE Surname = 'Kelley' OR FirstName = 'Rick';

If we try to use that index to evaluate Surname = ‘Kelley’ OR Name = ‘Rick’, there’s a problem. While the first of those predicates can be evaluated with a seek (it’s a sargable predicate on the left-most column of an index), the second predicate cannot. It’s sargable, but it is on the second column of the index (and for the moment let’s assume there are no other indexes on the table). Seeks are only possible if the predicate filters on a left-based subset of the index key.

Hence to evaluate that predicate SQL will have to do an index scan. Since it has to do a scan to evaluate the one predicate, it won’t bother also doing a seek to evaluate the first predicate as it can also evaluate that during the scan.

Hence, in this case, the query will execute with a single index scan.

IndexScanWithOr

So how do we get this query to rather seek?

(more…)