Gail | SQL in the Wild

Are int joins faster than string joins?

Author: Gail 15 February 2011 23 Comments

This one comes up a lot on the forums, often as advice given…

“You should use integers for the key rather than strings. The joins will be faster.”

It sounds logical. Surely integers are easier to compare than large complex strings. Question is, is it true?

This is going to take lots and lots of rows to get any significant results. I’m going to do two sets of tests. The first comparing query execution speed between string and integer data types on the join column while keeping the size of the join column the same between the two. This is to tell if there’s a difference just because of the data type.

The second test will have both the data type and the size of the join columns differing, while the total size of the table row will be kept the same. This is to answer the ‘string joins are slower because they are larger’ argument.

Test 1: Same key size, no indexes

The two tables have the same size join column – a bigint in one and a char(8) in the other.

1

2

3

4

5

6

7

8

9

SELECT t1.ID, t2.ID, t1.IntForeignKey, t2.SomeArbStatus
FROM dbo.TestingJoinsInt t1
INNER JOIN dbo.LookupTableInt t2 ON t1.IntForeignKey = t2.ID
GO
 
SELECT t1.ID, t2.ID, t1.StrForeignKey, t2.SomeArbStatus
FROM dbo.TestingJoinsString t1
INNER JOIN dbo.LookupTableString t2 ON t1.StrForeignKey = t2.ID
GO

First up, a check of what Statistics IO and Statistics Time show for the two and whether there’s any difference in execution plan. (Full repo code is available for download, link at the end of the post)

Int joins

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0.
Table ‘TestingJoinsInt’. Scan count 1, logical reads 66036, physical reads 0.
Table ‘LookupTableInt’. Scan count 1, logical reads 735, physical reads 0.

SQL Server Execution Times:
CPU time = 2433 ms, elapsed time = 32574 ms.

String joins

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0.
Table ‘TestingJoinsString’. Scan count 1, logical reads 66036, physical reads 0.
Table ‘LookupTableString’. Scan count 1, logical reads 735, physical reads 0.

SQL Server Execution Times:
CPU time = 3744 ms, elapsed time = 33947 ms.

(more…)

Performance, SQL Server, Syndication

Quest Webcast: Dos and Don’ts of Database corruption

Author: Gail 14 February 2011 0 Comments

I was guest presenter on last week’s Quest Pain of the Week presentation. The presentation is available for download from Quest. This is a roundup of the questions from the presentation. (I have edited some of the questions for spelling, grammar and readability)

Q: Why do you have to wait for a maintenance window to take a backup? you can backup during the time the database is being used.

You don’t have to wait, and with a 24×7 system you obviously can’t. But full backups do have an impact even though database can be in use at the time. They’re IO intensive. If you’re going to do backups while the system is in use, make sure that the additional IO load doesn’t adversely effect user performance, or look into things like SAN-level snapshot backups.

Q: If you partition your data, can you only backup the latest partition?

Yes, however be mindful of the requirements for restoring from file/filegroup backups. In full or bulk-logged recovery if the older filegroups are not read-only, you’ll need log backups covering the entire span from the oldest of the file/filegroup backups you use to the newest.

If the partitions with older data are read-only, then this is very easy to do.

For more information see http://msdn.microsoft.com/en-us/library/ms177425.aspx

Q: What if you inherit a database that fails the integrity check every day and the backups have the same issue so you don’t know when the integrity check originally failed?

That just means that restore from backup (full, file/filegroup or page) is not an option for fixing. Probably means that there will be no avoiding data loss.

Q: The 825 errors – these are reported in the logs too aren’t they? I can review to see if these are occurring, even if the alert is not set?

Yes they are (and I don’t know why I thought during the presentation that they aren’t, I’ve written a blog post on them myself). https://www.sqlinthewild.co.za/index.php/2008/12/06/when-is-a-critical-io-error-not-a-critical-io-error/

Msg 825, Level 10, State 2, Line 1.
A read of the file ‘D:\Data\SomeDB.mdf’ at offset 0x00000020e24000 succeeded after failing 1 time(s) with error: incorrect checksum (expected: 0x7532c420; actual: 0x320e4240). Additional messages in the SQL Server error log and system event log may provide more detail. This error condition threatens database integrity and must be corrected. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Q: Can you use Resource Governor for a large DB backup to keep the DB’s performance at an acceptable level?

Resource governor only limits CPU and memory, not IO.

Q: If you have data loss, shouldn’t you be doing a restore?

Maybe. It could be (due to time to restore or other constraints) that the fastest and acceptable solution is to repair (CheckTable or CheckFileGroup) and then sync in missing data later.

Q: I just had this happen on an archive database that I can play with. The problem occurred with lob’s. Since allow data loss removes the entire page that the corruption is on how does this work with lob’s that are not stored on the same page as the record?

The entire LOB chain (all LOB pages for the row) will be deallocated and the data/index record which is the parent will be deleted. (Ref: Chapter 11 of SQL Server 2008 Internals by Kalen Delaney et al. Chapter 11 written by Paul Randal)

Q: Is it possible to create a corrupted database in a test lab to get an 825 error?

Error 825 is a hard one to generate. To get that you need for an IO to fail or the retrieved page to be damaged and then on a retry (which occurs immediately) for the IO to succeed and the page to be correct. That’s hard to fake. You’ll need an IO filter driver that is coded to intermittently mess up IOs.

For 824, either use a hex editor to edit pages of a test database, or download and restore Paul Randal’s pre-corrupted databases.

For 823, create a database with a secondary file on a flash drive. Create a large table in the filegroup on the flash drive, checkpoint, DBCC dropcleanbuffers, then pull the flash drive out and query the table. If that doesn’t throw the error (meaning the pages are still in memory), update a row and checkpoint.

It goes without saying that neither of these should ever be done on a server!

Q: In case of db suspect (not recoverable) can we update the system table sysdatabases and change the status to make it online (this solution used to work in SQL 2000)

That is a SQL 2000 solution. In SQL 2005 and above, sysdatabases isn’t a table and the system tables cannot be updated as was possible in SQL 2000.

Fortunately there’s a fully documented and supported alternative – ALTER DATABASE <Database name> SET EMERGENCY. Once in emergency mode you can run CheckDB with repair_allow_data_loss and then bring the database online (assuming the repair succeeds), or extract data if the repair fails.

Q: Is Check DB the only tool available?

To check database integrity, yes.

Q: If you got a foreign key, you can restore a table but will integrity not be broken?

Relational and possibly transactional integrity may be broken. I’d guess that’s one reason table-level restores aren’t available natively in the product.

Q: Checksum on database backup – this is not an option in maintenance plan – true? To enable this, I have to script all database backups?

Correct, it’s not an option on maintenance plans. Personally I’m not a fan of maintenance plans for anything other than trivial environments.

Q: We use some SQL Express, do all these items work in Express? I had one that the hard drive died and I was able to get it back from suspect at which point I did a full backup and restored it elsewhere on a full SQL Server to fix it

The only limitation for CheckDB on Express is no parallelism.

File/filegroup restores can be done on Express, though they are offline restores.

Page restores can be done on Express, though they are offline restores

Thanks to Paul Randal and Robert Davis for assistance with a couple of these answers.

Corruption, SQL Server, Syndication

Is a clustered index best for range queries?

Author: Gail 1 February 2011 10 Comments

I see a lot of advice that talks about the clustered index been the best index for use for range queries, that is queries with inequalities filters, queries that retrieve ranges of rows, as opposed to singleton queries, queries that retrieve single rows (including, unfortunately, a Technet article).

I suspect the reasoning behind this advice is the idea that the clustered index stores the data in order of the clustering key (ack) and hence it’s ‘logical’ that such a structure would be best for range scans as SQL can simply start at the beginning of the range and read sequentially to the end.

Question is, is that really the case?

Let’s do some experiments and find out.

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

CREATE TABLE TestingRangeQueries (
ID INT IDENTITY,
SomeValue NUMERIC(7,2),
Filler CHAR(500) DEFAULT ''
)
 
-- 1 million rows
INSERT INTO TestingRangeQueries (SomeValue)
SELECT TOP (1000000) RAND(CAST(a.object_id AS BIGINT) + b.column_id*2511)
FROM msdb.sys.columns a CROSS JOIN msdb.sys.columns b
 
-- One cluster and two nonclustered indexes on the column that will be used for the range filter
 
CREATE CLUSTERED INDEX idx_RangeQueries_Cluster
ON TestingRangeQueries (ID)
 
CREATE NONCLUSTERED INDEX idx_RangeQueries_NC1
ON TestingRangeQueries (ID)
 
CREATE NONCLUSTERED INDEX idx_RangeQueries_NC2
ON TestingRangeQueries (ID)
INCLUDE (SomeValue)
GO

The query that I’ll be testing with will do a sum of the SomeValue column for a large range of ID values. That means that of the three indexes that I’m testing, one is clustered, one is a nonclustered that does not cover the query and the third is a covering nonclustered index.

1

2

3

SELECT SUM(SomeValue)
FROM TestingRangeQueries
WHERE ID BETWEEN 20000 and 200000 -- 180 001 rows, 18% of the table

I’m going to run the same range scan query three times, each with an index hint so that SQL will use the three different indexes, regardless of which one it thinks is best.

First up, the clustered index.

As expected, we get a clustered index seek (the predicate is SARGable) and a stream aggregate.

Table ‘TestingRangeQueries’. Scan count 1, logical reads 12023, physical reads 0.

SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 110 ms.

(more…)

Indexes, Performance, SQL Server, Syndication

Upcoming presentations

Author: Gail 28 January 2011 0 Comments

It’s looking like it’s going to be a busy year.

It’s only January and I already have three presentations lined up for the first quarter of the year.

On the 10^th February I’m presenting on database corruption for Quest’s Pain of the Week webcast. I’m not Paul Randal, but I think I might be able to muddle through the topic without making too much of a fool of myself.
27^th February I’m speaking at SQLSaturday 65 in Vancouver! First trip to Canada, first SQLSaturday attendance. I’m doing a presentation that’s an old favourite of mine – Introduction to Indexes. It’s very much aimed at beginners, so don’t expect any dark magic or mystical secrets
15th March I’m taking part in the 24 Hours of PASS. The session is titled “Bad plan! Sit” Thanks to Steve Jones (blog|twitter) for the initial idea.It’s going to have something to do with bad execution plans. I haven’t decided exactly what yet. I guess that means it will be a surprise.

In addition to these, I’m also presenting at both the February and March meetings of the South African SQL Server usergroup.

That’s definitely enough work for the next two months…

Community, Syndication

Distincting an IN subquery

Author: Gail 18 January 2011 7 Comments

This is going to be a quick one…

I keep seeing forum code (and production code) that includes the DISTINCT in IN or EXISTS subqueries. The rationale is given either as a performance enhancement or as necessary for correct results.

Is it necessary or useful? Only one way to find out.

Let’s check for correct results first, because that can be done with nice small tables.

01

02

03

04

05

06

07

08

09

10

11

12

13

CREATE TABLE DistinctOuter (
ID INT
);
 
CREATE TABLE DistinctInner (
ID INT
);
 
INSERT INTO DistinctOuter
VALUES (1), (2), (3), (4), (5), (6), (7), (8)
 
INSERT INTO DistinctInner
VALUES (1), (2), (2), (2), (2), (4), (6), (7)

(more…)

Performance, SQL Server, Syndication

Capturing the Execution Plan

Author: Gail 4 January 2011 9 Comments

One last post on execution plans and Profiler (at least for now)

When trying to check a query’s execution plan, the display execution plan option of Management Studio is usually adequate, however there are occasions where it’s either not feasible to run the query from Management Studio or the particular behaviour can’t be reproduced in Management Studio (perhaps because of different SET options). For cases like this it may be necessary to capture the execution plans via SQL Trace. Fortunately there are events for execution plans. Eight of them, to be precise, all under the Performance folder.

Great, so there’s no shortage of options available. But what are the differences between them?

Showplan All

According to Books Online:

The Showplan All event class occurs when Microsoft SQL Server executes an SQL statement. Include this event class to identify the Showplan operators on Microsoft SQL Server 2000 or Microsoft SQL Server 7.0. This event class will also work on SQL Server 2005 and later; however, the information included is a subset of the information available in the Showplan XML Statistics Profile or Showplan XML event class.

So in other words this one is not generally the event that we should be looking at on the latest versions of SQL. It’s not deprecated however (at least not in SQL 2008), so it is still usable on the later versions if you absolutely want.

(more…)

Execution Plans, SQL Server, Syndication

The books of the year

Author: Gail 1 January 2011 11 Comments

At the beginning of the year I set myself a goal of 50 books. I managed 45, which isn’t bad considering there were a couple of months where I was re-reading old favourites only.

I’m not going to go into the level of detail Paul Randal did, rather I’m just list the books read, the genre and give my top 3 of the year.

Top 3:

Princeps’ Fury: Book Five of the Codex Alera by Jim Butcher I’m adoring the entire Codex Alara series. Well written, strong character, interesting form of magic and looks like a climactic ending. The last book should be in my post box in a couple of weeks.
Sabriel (Abhorsen) by Garth Nix Actually the entire series belongs here. They’re apparently teenage books, but they’re complex and deal with adult themes.
In Search of Schrödinger’s Cat: Quantum Physics and Reality by John Gribbin Exceptional take on the birth of quantum physics, minimal to no maths skills required. It’s written for the layman and is part history, part physics.

Complete list:

Fantasy:

Science Fiction

Other fiction

The Templar Legacy by Steve Berry
Stray by A.N. Wilson
Rainbow Six by Tom Clancy
The Teeth Of The Tiger (Jack Ryan) by Tom Clancy
Wolf of the Plains by Conn Iggulden
Night Probe! by Clive Cussler
Valhalla Rising (Dirk Pitt Adventure) by Clive Cussler

Non-fiction

Goal for 2011… 50 books.

Personal

Looking back

Author: Gail 1 January 2011 3 Comments

2010 has been an ‘interesting’ year. Interesting in the Chinese curse sense of the word mostly.

I set a bunch of goals back at the end of 2009, but to be quite honest I’m not even going to look at them. If I achieved even one it was more chance than anything. Still there have been a couple of positive aspects to the year.

Enthusiasm for SQL-related stuff (and pretty much everything else) was finally relocated. I wrote 3 blog posts this last week, and it didn’t feel like a chore for the first time in many months.
For the first time since I moved to consulting, my income comfortably exceeds my expenses. Moving to consulting at the start of an economic downturn was probably a stupid thing to do, but I never do things the easy way.
Got the hardest part of my Masters thesis out of the way, all that’s really left is a bit more data collection and writing up.

So, some goals for the new year…

Finish the thesis. This isn’t optional. I’ve committed to handing it in by June.
Aikido at least once a week and archery at least one Saturday a month. I’ve skipped far too many classes and practices these last few months.
Attempt the first MCM test. This probably won’t be until the latter part of the year. The Master’s thesis comes first
Get back into computer graphics. Old hobby that’s been mostly neglected, but that I still enjoy

No blog-related goals or article quotas. I intend to write a fair bit, but I don’t want to set requirements.

One last thing, for anyone wondering about the computer graphics, these are some work-in-progress or finished pieces.

Personal

Do IF statements cause recompiles?

Author: Gail 14 December 2010 2 Comments

I heard this one over at SSC a while back. “Avoid IF statements in stored procedures as they result in recompiles”

Ok, it sounds reasonable, if the optimiser optimises for the execution path taken on first execution it’ll have to go back and optimise the other paths when they are executed.

But is that really how it works? Now that I’ve spend some time looking at how the cache monitoring trace events behave, it’s possible to find out using those.

Let’s start with a simple example

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

CREATE TABLE Tbl1 (
  ID INT
);
 
CREATE TABLE Tbl2 (
  ID VARCHAR(10),
  SomeDate DATETIME
);
GO
 
CREATE PROCEDURE TestingRecompiles (@SomeParam INT)
AS
IF (@SomeParam = 1)
  SELECT ID FROM Tbl1;
ELSE
  SELECT SomeDate FROM Tbl2;
GO

Simple enough. First execution will be with the parameter value of 1. I’m going to use Profiler to see what’s happening. Events traced are SP:CacheInsert, T-SQL:StmtRecompile and the XML Plan for query compile, so I can see exactly what plan was generated. I’m using the ‘For Query Compile’ event so that I can catch the plan at optimisation time, not at execution time.

(more…)

Execution Plans, SQL Server, Syndication

On burnout and recovery

Author: Gail 9 December 2010 16 Comments

I suspect for a fair number burnout is something that they believe happens to others. People with less fulfilling jobs, people who don’t love their work, people in poorly managed companies. Other people.

i certainly didn’t consider myself at risk for burnout. I loved my work, I had a great manager prior to moving to consulting and was pretty much my own manager afterwards. I had a hobby (or 5), practiced a sport, took time to play games and relax. At risk? Never!

At least that’s what I thought until sometime around May last year when I realised I had absolutely no enthusiasm for anything related to IT. Blogging was a chore, books held no interest and many a day I opened management studio intending to do some development or investigation and then several hours later I’d close it without a line of code having being written.

The biggest mistake (I think) was in how I handled it. Instead of taking a break (and work was quiet around that time so I could), I took on more commitments, thinking that it would help motivate. Exceptional DBA Awards. SSC Articles. Tech-Ed presentations. PASS Summit presentations. SQL Exams. Hell I even started discussing and planning to write a book.

Bad idea.

Far from acting as motivation, the long list of things to do just made things worse. Far worse. I now had hard deadlines, people chasing me and still no motivation or enthusiasm. Naturally everything came to a head at the worst possible time and place – the PASS summit in Seattle that year. It’s really hard to write and deliver speeches, prep and deliver presentations and generally act friendly and enthusiastic when the only thing that you want to do is get on the next flight and go home. Fortunately I think only a couple of people noticed.

It’s only in the last couple of months (October/November 2010) that the enthusiasm for writing, blogging, researching (and in fact anything other than sitting and watching the world go by) has partially returned. Is the burnout past? No, definitely not, I still have days (weeks) where I can’t summon the enthusiasm to care, but it is getting better.

But this isn’t just a post on history or a poor attempt at sympathy. The point is how I handled it (badly). Looking back, what I should have done was

Take a break (work was quiet at the time). Not a weekend crashed in front of the TV, but a proper break – out of town for a week or so.
Get some help from friends, rather than pushing them away and pretending everything was fine.
Try a new technology rather than piling on SQL stuff. WCF, Ruby, F#, anything as long as it was different.

I’m far from qualified to offer advice on this issue, but I would suggest to anyone feeling the same way, don’t ignore it, don’t try to work through it and don’t hide it. Few problems go away by themselves, burnout certainly doesn’t.

Personal

Author Archive: Gail

Are int joins faster than string joins?

Test 1: Same key size, no indexes

Quest Webcast: Dos and Don’ts of Database corruption

Is a clustered index best for range queries?

Upcoming presentations

Distincting an IN subquery

Capturing the Execution Plan

Showplan All

The books of the year

Looking back

Do IF statements cause recompiles?

On burnout and recovery

Post navigation