SQL Server

Corruption in sys.dependencies

The second type of corruption I want to look at is that of missing references in sys.sql_dependencies. This typically appears on SQL 2005 in a database upgraded from SQL 2000.

On SQL 2000 it was possible to make direct updates to the system tables, and, in my opinion, if was done too often without the DBAs realising the long-term consequences.

As an interesting aside, early in 2005 I attended a series of pre-launch courses on SQL 2005. Among the other attendees was a senior DBA from one of our large banks. When he heard that 2005 hid the system tables and that it was no longer possible to update them, he went on a 10 minute rant about how MS was making his life impossible and removing essential features, etc, etc. It turned out that he did direct updates to drop users, drop logins, link logins to users, drop columns from tables and several other things. He absolutely would not accept that there were other ways to do those tasks and that modifying the system tables was risky. I’d hate to see the state of that database…

Back to the point…

(more…)

Stats blob corruptions

There have been a couple odd types of corruption that I’ve been seeing on the forums. I want to take the time to go into some detail, because the fix is possibly not obvious. The first one that I want to look at is corruption of a stat blob.

These are specifically SQL 2000 errors, I don’t know what the equivalent errors in SQL 2005 look like, if there are equivalent errors.

The errors that indicate corruption in the stats blob look like this.

Server: Msg 8929, Level 16, State 1, Line 4 Object ID 2:
Errors found in text ID 153175064576 owned by data record identified by RID = (1:526:13) id = 238623893 and indid = 1.
Server: Msg 8929, Level 16, State 1, Line 4 Object ID 2:
Errors found in text ID 153175195648 owned by data record identified by RID = (1:526:15) id = 50815243 and indid = 3.

The things to note are the Object ID, 2 being sysindexes in SQL 2000 and the reference to an indid in the section that identifies a record.

In SQL 2000, the statistics for an index (or indeed for column statistics) was stored in an image column in sysindexes called statblob (ref). Each index (and statistic) has a row in sysindexes and keeps the statisics data in that image column. This means that the statblob is just as susceptible to corruption as any other LOB column. Unfortunately it’s not as easily fixed. In fact, since it’s a corruption in the system tables checkDB will not repair it

All well and good, so how do we fix these?

(more…)

EXISTS vs IN

This one’s come up a few times recently, so I’ll take a look at it.

The EXISTS and IN clauses at first glance look fairly similar. They both use a subquery to evaluate rows, but they do it in a slightly different way

IN does a direct match between the column specified before the IN keyword and the values returned by the subquery. When using IN there can only be a single column specified in the select clause of the subquery

Let’s have a look at a quick example

DECLARE @SomeTable (IntCol int)
Insert into @SomeTable (IntCol) Values (1)
Insert into @SomeTable (IntCol) Values (2)
Insert into @SomeTable (IntCol) Values (3)
Insert into @SomeTable (IntCol) Values (4)
Insert into @SomeTable (IntCol) Values (5)

SELECT *
FROM BigTable
WHERE SomeColumn IN (Select IntCol FROM @SomeTable)

So this query returns all the rows from BigTable where SomeColumn has any of the values returned by the subquery, that is 1,2,3,4 or 5

But what if there were duplicate values returned by the subquery?

(more…)

SQL Quiz 5 – SANs and magic tricks

Chris Shaw‘s come up with yet another of his SQL quizes, this one’s a little more technical than the previous few have been. Tim Ford tagged me on this one.

1) Do you feel that you have a reliable SAN Solution? If so what is the secret?

Well, this one’s easy for me. What SAN?

I work from home and don’t directly admin databases any longer. Unlike Paul and Kimberly, I can’t afford a multi TB SAN in the basement to play with (not that I have a basement)

The most advanced storage I have here is a RAID 5 array on my fileserver/domain controller, and that came in very handy a couple months ago when one of the drives in there failed without warning.

2) Describe in laymen’s term what database mirroring is

Ok, do you know the kind of sleight-of-hand tricks that performing magicians do? Show a coin in one hand, make it disappear and then have it appear in the other hand? The trick is that there are two identical coins, but one one is ever visible to the audience.

Database mirroring’s like that. There are two identical databases (kept identical by technological wizadry), the users can only ever see one of them and when one disappears the other one appears, hopefully fast enough that the users don’t get frustrated waiting.

Let’s see… going to pass this one on to Paul Randal and Christopher Stobbs

Is a scan a bad thing?

This one comes up from time to time, so I thought I’d have a go at addressing it.

Let’s imagine a hypothetical DBA who’s doing some performance tuning. He looks at a query plan for a moderately complex query and panics because there’s a couple of index scans and he wants to rather see index seeks.

Is that correct, are index scans bad and index seeks good?

Well, most of the time yes. Most of the time a scan is a problem and indicates a missing index or a query problem, but there are other times that it’s the most optimal way to get the required rows from the table.

I’ve previously looked at the case where the index seeks actually reads the the entire table, in this post I’m going to be evaluating some common query constructs to see when a seek really is the most optimal operator.

Let’s start with the simplest case, and I’m going to use the AdventureWorks database for these queries.

[source:sql]select ProductID, Name from Production.Product[/source]

In this case I get an index scan on the AK_Product_Name index and that makes perfect sense. I’m asking for all the rows in the table. there is no way that SQL can use a seek to execute that query. For there to be a seek, there has to be a SARGable predicate within the query that can be used for the seek.

(more…)

Why the DMVs are not a replacement for SQL Trace

With the introduction of the DMVs in SQL 2005, there’s now a wealth of information available on every aspect of SQL’s behaviour. It’s possible now to interrogate the SQL procedure cache to find out what queries have been running and how well or badly they are performing. So does all this mean that SQL Trace is now obsolete?

It does not. The execution stats available through sys.dm_exec_query_stats are only retained while the plan for the query is in cache. As soon as the plan is removed from cache (for whatever reason), the query stats for that query will be discarded. There are also a couple of reasons for a query’s plan not going into the cache at all.

Let’s try a quick example. I’m going to use the AdventureWorks database (SQL 2008), because it’s convenient. I’m going to create 4 procedures, run them several times, along with a couple other commands and then compare what a trace shows and what a query of the plan cache shows.

(more…)

The Lazy Writer and the Checkpoint

Or “What writes what when?

It’s important to understand how and when SQL makes changes in memory and how and when those changes are written to the data file. It has a big impact on memory use and IO patterns.

Data modification queries (insert, update, delete) always make changes to data pages in memory. In fact, queries in general only operate on in-memory data pages. When a query makes a change to an in-memory page, that page is marked as ‘dirty’, indicating that there are changes on that page that have to be written to disk before the page can be discarded from memory. That writing is the job of two system processes, the lazy writer and the checkpoint.

Lazy Writer

The job of the lazy writer is to find dirty pages in the buffer pool and write them out to disk and drop those pages from cache. It does this to keep a certain amount of free pages available within the buffer pool for data that may be requested by other queries. The pages that it writes out are ‘older’ pages, ones that haven’t been used for a while.

(more…)

Deleting the transaction log

Or “Why is my database now inaccessible?

This is another things that I’ve seen a few things recently on various forums. A DB’s transaction log fills the disk, probably due to a lack of log backups, and then either SQL is stopped and the log file deleted or the database is detached and the log file deleted. Either way it’s not a particularly good thing to do.

No, I’ll be blunt, it’s a appallingly bad thing to do. The transaction log is not an optional piece of the database. It’s not like an installation log or activity log where the entries are there just in case anyone’s interested. The transaction log is what SQL uses to ensure, at all times, that the data in the database is transactionally consistent. It’s what gives SQL databases the consistency and durability properties that are required from a relational database engine.

The transaction log has a number of uses within SQL.

  1. In SQL 2000 and before, the inserted and deleted tables were materialised from the transaction log. This was changed in SQL 2005 and they’re now materialised from the row version store in TempDB.
  2. A transaction rollback uses the transaction log to determine what needs to be undone.
  3. In transactional replication, the log reader uses the transaction log to determine what changes need to be replicated
  4. Used by Change Data Capture in SQL 2008 to extract changes made to registered tables
  5. During restart-recovery to ensure that transactions that had committed when the service stopped but whose changes had not been written to the data file are replayed and to ensure that transactions that hadn’t completed are rolled back.

For now, I’m just going to consider the last one.

(more…)

Does Repair_Allow_Data_Loss cause data loss?

Short answer: Yes. That’s why the option has that name.

Long answer:

When CheckDB finds corruption it will, as part of the output, include the minimum level of repair that will fix the errors that it found. In SQL 2005 and higher, it will look something like the following.

CHECKDB found 42 allocation errors and 137 consistency errors in database ‘SomeDB’.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (SomeDB).

There are two options for the minimum repair level. REPAIR_REBUILD and REPAIR_ALLOW_DATA_LOSS.

If the minimum level is REPAIR_REBUILD then it means that the corruption was confined to just the nonclustered indexes. This is good. Nonclustered indexes are redundant data and hence can be dropped and recreated (which is pretty much what CheckDB with the REPAIR_REBUILD option does)

If the minumum level is REPAIR_ALLOW_DATA_LOSS then running CheckDB with the REPAIR_ALLOW_DATA_LOSS option will cause data loss. That’s why the option is named like that. How much it will lose depends on the extent of the corruption.

There’s one other case. What about when checkDB doesn’t specify a minimum level?

CHECKDB found 3 allocation errors and 7 consistency errors in database ‘SomeDB’.

That means that there’s damage to one or more of the allocation pages or there’s corruption in the system tables. That kind of corruption isn’t repairable, so no minimum repair level can be specified.

So, in conclusion, will running CheckDB with the REPAIR_ALLOW_DATA_LOSS option cause data loss? If that’s the minimum level that CheckDB specified, then yes it will.

For more info, see Paul Randal’s blog and an article that I wrote at SQL Server Central.

Edit: There is one case where the repair level repair_allow_data_loss is required, but no data will be lost. This is the case of the incorrect PFS page (see my article at SQLServerCentral for more info). If that is the only error, repair_allow_data_loss will be the minimum level, but no data will actually be lost.