Come and gone

Or “Plan cache monitoring – insert and remove

Previously I took a look at the CacheHit and CacheMiss events to see how they behave and how to identify what’s been searched for in the cache. in this follow up, I want to take a similar look at the CacheInsert and CacheRemove events, see when they fire and how to identify the objects that they relate to.

Again, a word of caution, these can be frequently occurring events on busy servers and so traces should be kept short and to a minimum of events and columns. That said, these should occur a lot less often than the CacheHit and CacheMiss events. If they are occurring very frequently it may indicate that the SQL Server is not reusing plans efficiently.

CacheInsert

The CacheInsert event fires after a CacheMiss. The search for a matching plan in the cache failed, firing a CacheMiss event. Since there’s no plan, the optimiser is invoked to generate one and then that plan is inserted into the plan cache before the Query Execution engine begins execution.

The event is fairly simple, though of course there are a few surprises (what in SQL doesn’t have?).

Exec FireCacheEvents
GO

SELECT ID, SomeDate, Status
FROM TestingCacheEvents
WHERE Status =  'C'

CacheInsert

Continue reading ‘Come and gone’

Hit and miss

Or “Monitoring plan cache usage

For people interested in the details of how SQL is using and reusing execution plans, there are some useful events in profiler for watching this in detail, under the Stored procedure group:

  • SP:CacheMiss
  • SP:CacheInsert
  • SP:CacheHit
  • SP:CacheRemove
  • SP:Recompile
  • SP:StmtRecompile

Additionally there’s the SQL:StmtRecompile event under the TSQL group.

For now, I just want to look briefly at the CacheMiss and CacheHit events.

One word of caution early on, these are frequently occurring events and it may not be a good idea to trace these on busy production servers. If you do need to, keep the duration of the trace short and the columns to a minimum.

CacheEvents

CacheMiss

The cache miss event fires any time SQL looks for the execution plans for an object or ad-hoc batch and does not find it in the plan cache.

For an object (scalar function, multi-statement table-valued function, stored procedure or trigger) the match is done on the object ID (along with some of the connection’s SET options and possibly the database user and c couple other factors1). For an ad-hoc batch, the match is done on a hash of the text of the batch (along with some of the connection’s SET options and possibly the database user)

When testing stored procedures from Management Studio (or another SQL querying tool), two CacheMiss events will appear in the trace.

CacheMiss

What’s going on here?

Continue reading ‘Hit and miss’

Does a missing data file send a database suspect?

Short answer: No

I keep seeing this come up on various forums when people ask what makes a database go suspect or what could have caused their database to be marked suspect. I can almost guarantee when that question comes up, one or more people will answer ‘missing files’. That may have been true on earlier versions (I don’t have a SQL 2000 or earlier instance to play with), but it is not true in the current versions of SQL. A missing file may result in the database being inaccessible (depending what file), but it will not result in a suspect database.

Let’s prove it.

I’m going to create a database with three files, two of which are in the primary filegroup, with one user table on each filegroup. (T-SQL code at the end) Before each test I’ll begin a transaction, modify data in both tables, shut SQL down so that there’s an uncommitted transaction in the log (database cannot be cleanly shut down), then rename a file before restarting SQL and see what happens.

I’m not going to play with the transaction log. That I’ve done before. In SQL 2005/2008, if the transaction log is missing and the database was cleanly shut down, SQL will recreate it. If the the transaction log is missing and the database was not cleanly shut down, the database goes into the RECOVERY_PENDING state, so no SUSPECT here.

Let’s try the file in the secondary filegroup first.

Continue reading ‘Does a missing data file send a database suspect?’

DBA Hell

On the first day of DBA hell, the server gave to me
A database with damaged system tables and no good backups (1)

On the second day of DBA hell, the server gave to me
Two databases with widespread corruption, no backups (1, 2)

On the third day of DBA hell, the server gave to me
Three suspect databases, no backups (1,2,3)

On the fourth day of DBA hell, the server gave me nothing, cause I didn’t have a job any longer…

How does one end up with a critical production database that has no backups? I could kinda understand if the backups were damaged, if the corruption went undetected for long enough that it was in the backups as well, but to have no backups at all? Of an important database?

The only excuse for having no backups is if the database can be trivially and completely recreated from another source with minimal impact to the users. This is not the normal scenario.

There’s an immense amount of information available on backup and restore strategies.

That’s just a quick list, there’s far more information available than that. Enough that there’s really no good excuse to not have backups when they’re needed.

As Steve Jones (blog|twitter) is fond of saying “Good backup, good resume. You only need one”

Redgate's Exceptional DBA competition

Redgate’s Exceptional DBA competition is back for a third year! I was one of the judges for this last year and, while I’m not judging it this year, I do have some advice for anyone considering entering.

Be Explicit and detailed

This is not a competition won by luck. There are no dice rolled, no coins tossed, no numbers drawn from a hat.

As an entrant, you need to convince the judges that you (or the person you are nominating) are the best of the best. The only thing that you can use to do that are the answers on the entry form.

The more the better (within reason). To give an idea, last year the answers to one question (What’s the hallmarks of an exceptional DBA?) ranged from one word to half a page. Which of those two do you think the judges rated higher?

If you can, get a colleague to read over your answers before submitting them. Ask them for their opinion, ask them if there are any pieces that they’d change or add to show you (or the person you are nominating) in their very best light.

Spell check

Please, please, please run a spell check and grammar check over your entries before submitting. This goes double if English (or American) is not your first language. There is nothing that makes an entry look bad more than por speeling thet teh jugdes mast spand tyme desifering.

No, not all of us speak English fluently, but there are enough grammar and spell checkers available (hint Firefox includes one if you download the dictionary) that not bothering shows a lack of interest and professionalism. Besides, if the judges can’t work out what you’re saying, they’re not going to rate your entry highly.

On this point, watch the l33t speak and SMS/twitter style word-shortening. They’re harder to read that fully written out words, and space is not at a premium for these entries. Again, you should be trying to show that you are a professional, much like you would when writing up a CV.

Watch the humour

What’s funny for one person may be annoying or offensive to another. A joke about ‘cleaning up after those incompetent developers’ may not be funny to a judge who is a developer or comes from a development background. Again, keep it professional, imagine that these answers are going to be seen by the CIO/owner/MD of the company you work for.

Along the same lines, funny answers aren’t. One entry last year, for the question “Why do you deserve to win?”, gave as an answer “No idea ;-)

Well dude, if you don’t know why you should win, I sure as hell don’t.

In Conclusion

If you’re planning to enter this competition, you have one chance to make an impression with the judges – your answers. Make it the best impression that you can.

In, Exists and join – a roundup

Over the last several months I’ve had a look at IN, Exists, Join and their opposites to see how they perform and whether there’s any truth in the advice that is often seen on forums and blogs advocating replacing one with the other.

Previous parts of this series can be found:

In this roundup post, I’m going to do multiple tests on the 6 query forms, with different numbers of rows, indexes, no indexes and, for the negative forms (NOT IN, NOT EXISTS), nullable and non-nullable join columns.

In the individual tests, I used 250000 rows in the first table and around 3000 rows in the secondary table. In this roundup, I’m going to use 3 different row counts, 1000000 rows, 50000 rows and 2500 rows. That should give a reasonable idea for performance at various table sizes. (Not much point in going smaller than 2500 rows. Everything’s fast on 100 rows)

Some notes on the tests.

  • The version of SQL is SQL Server 2008 SP1 x64 Developer Edition.
  • The tests were run on a laptop. Core-2 Duo, 3 GB memory. SQL limited to 1 processor, so no parallelism possible.
  • Each query will be run 10 times, reads, cpu and duration measured by profiler and averaged.
  • Each query will be run once before the tests start to ensure that the data is in cache and the execution plans are generated and cached.
  • Reproduction scripts will be available for download.

Continue reading ‘In, Exists and join – a roundup’

Running SQL faster

Poor performance is not uncommon in most SQL Server environments. The data sizes are growing, the hardware isn’t and traditional methods of performance tuning are time-consuming and difficult.

So what’s the solution? Well, throwing hardware at the problem is an old favourite. There are few workloads that a nice 256-processor Itanium with a terabyte or two of memory won’t handle, but servers like that are a little on the expensive side and lots of money spent on expensive hardware means less that can be spent on annual bonuses.

There is another option, a hidden, undocumented option that can improve query performance, maybe a little, maybe substantially.

First thing that you need to do to get this one is to enable the hidden options in sp_configure. This is done much the same way as the advanced options.

exec sp_configure 'show hidden options', 1
RECONFIGURE WITH EXTREME OVERRIDE

Once that’s done, the undocumented option can be enabled.

exec sp_configure 'run queries faster', 101010
 RECONFIGURE WITH EXTREME OVERRIDE

How much improvement this will give depends on the kind of queries being run. OLTP systems usually see a greater improvement than decision-support, unless there’s full text search or spatial queries, in which case there will likely be substantially less of a gain.

Now, there are a few things to consider.

  1. This is obviously undocumented and that means unsupported.
  2. It may not work on the next version of SQL.
  3. If you call support, disable the option first and don’t tell them you were running it!

.

Happy April Fools’ day.

.

..

….

Seriously now, there’s no options that, when enabled, makes SQL run queries faster. There is no silver bullet for performance problems, there is no one-size-fits-all fix.

Fixing performance problems involves finding the current bottleneck and removing it, then repeating that operation until performance is acceptable. It’s a complex area and there’s a lot to it. Simply throwing hardware at the problem may not produce much, if any, performance gain, especially if the hardware wasn’t the bottleneck.

If you have a query performance problem and don’t know where to start, ask on one of the SQL forums (like SQLServerCentral) if it’s not an urgent problem. If it is, or if there are serious problems, consider getting a consultant in to help out. One of the quickest ways to learn is to learn from someone who knows what they are doing.

Left outer join vs NOT EXISTS

And to wrap up the miniseries on IN, EXISTS and JOIN, a look at NOT EXISTS and LEFT OUTER JOIN for finding non-matching rows.

For previous parts, see

I’m looking at NOT EXISTS and LEFT OUTER JOIN, as opposed to NOT IN and LEFT OUTER JOIN, because, as shown in the previous part of this series, NOT IN behaves badly in the presence of NULLs. Specifically, if there are any NULLs in the result set, NOT IN returns 0 matches.

The LEFT OUTER JOIN, like the NOT EXISTS can handle NULLs in the second result set without automatically returning no matches. It behaves the same regardless of whether the join columns are nullable or not. Seeing as NULL does not equal anything, any rows in the second result set that have NULL for the join column are eliminated by the join and have no further effect on the query.

It is important, when using the LEFT OUTER JOIN … IS NULL, to carefully pick the column used for the IS NULL check. It should either be a non-nullable column (the primary key is a somewhat classical choice) or the join column (as nulls in that will be eliminated by the join)

Onto the tests

The usual test tables…

CREATE TABLE BigTable (
id INT IDENTITY PRIMARY KEY,
SomeColumn char(4) NOT NULL,
Filler CHAR(100)
)

CREATE TABLE SmallerTable (
id INT IDENTITY PRIMARY KEY,
LookupColumn char(4) NOT NULL,
SomeArbDate Datetime default getdate()
)

INSERT INTO BigTable (SomeColumn)
SELECT top 250000
char(65+FLOOR(RAND(a.column_id *5645 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *3784 + b.object_id)*12)) +
char(65+FLOOR(RAND(b.column_id *6841 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *7544 + b.object_id)*8))
from master.sys.columns a cross join master.sys.columns b

INSERT INTO SmallerTable (LookupColumn)
SELECT DISTINCT SomeColumn
FROM BigTable TABLESAMPLE (25 PERCENT)
-- (3918 row(s) affected)

First without indexes

-- Query 1
SELECT BigTable.ID, SomeColumn
	FROM BigTable LEFT OUTER JOIN SmallerTable ON BigTable.SomeColumn = SmallerTable.LookupColumn
	WHERE LookupColumn IS NULL

-- Query 2
SELECT ID, SomeColumn FROM BigTable
WHERE NOT EXISTS (SELECT LookupColumn FROM SmallerTable WHERE SmallerTable.LookupColumn = BigTable.SomeColumn)

Let’s take a look at the execution plans

LeftOuterJoinNotIN_NotIndexed

Continue reading ‘Left outer join vs NOT EXISTS’

The Root of all Evil

Or “Shot gun query tuning

There have been a fair few forums questions in recent months asking for help in removing index scans, loop joins, sorts or other, presumed, slow query operators. There have been just as many asking how to replace a subquery with a join or a join with a subquery or similar aspects of a query usually for performance reasons.

The first question that I have to ask when looking at requests like that is “Why?”

Why is removing a particular query operator the goal? Why is changing a where clause predicate the goal? If it’s to make the query faster, has the query been examined and has it been confirmed that query operator or predicate really is the problem?

The title of this post refers to a comment I’ve seen again and again in blogs or articles about front-end development. “Premature optimisation is the root of all evils.” It’s true in the database field as well.

Continue reading ‘The Root of all Evil’

NOT EXISTS vs NOT IN

Continuing with the mini-series on query operators, I want to have a look at NOT EXISTS and NOT IN.

Just one note before diving into that. The examples I’m using are fairly simplistic and that’s intentional. I’m trying to find what, if any, are the performance differences in a benchmark-style setup. I’ll have some comments on more complex examples in a later post.

The most important thing to note about NOT EXISTS and NOT IN is that, unlike EXISTS and IN,  they are not equivalent in all cases. Specifically, when NULLs are involved they will return different results. To be totally specific, when the subquery returns even one null, NOT IN will not match any rows.

The reason for this can be found by looking at the details of what the NOT IN operation actually means.

Let’s say, for illustration purposes that there are 4 rows in the table called t, there’s a column called ID with values 1..4

WHERE SomeValue NOT IN (SELECT AVal FROM t)
is equivalent to
WHERE (
SomeValue != (SELECT AVal FROM t WHERE ID=1)
AND
SomeValue != (SELECT AVal FROM t WHERE ID=2)
AND
SomeValue != (SELECT AVal FROM t WHERE ID=3)
AND
SomeValue != (SELECT AVal FROM t WHERE ID=4)
)

Let’s further say that AVal is NULL where ID = 4. Hence that != comparison returns UNKNOWN. The logical truth table for AND states that UNKNOWN and TRUE is UNKNOWN, UNKNOWN and FALSE is FALSE. There is no value that can be AND’d with UNKNOWN to produce the result TRUE

Hence, if any row of that subquery returns NULL, the entire NOT IN operator will evaluate to either FALSE or NULL and no records will be returned

So what about EXISTS?

Continue reading ‘NOT EXISTS vs NOT IN’