Latest Posts

Greatest weakness

Someone went off and started asking people what their greatest weaknesses are, then someone else decided to pass the question my way. Perhaps those someones’ weaknesses are curiosity… Still, since everyone else is revealing their darkest secrets, I’ll give it a go as well.

Mine would have to be two very closely related things. Procrastination and short attention span. (why do you think this post is 4 days late?)

I tend to put stuff off until the absolute last minute (and sometimes beyond then) and then work frantically to get it finished. And because I’m also a bit of a perfectionist, I’m not satisfied with half-done work or a hack job so I’ll be working late into the night (or over a weekend) to get it done properly.

To add to that, unless I’m doing something I really enjoy, my attention span tends not to be very long. 15-20 minutes is good, it’s usually less. After that it’s either fight to stay focused or take a short break. In and of itself, that’s not so bad, the problem is that the breaks are often not so short, if I get caught up in whatever else I’m doing, or distracted by something else (and something else and something else).

So what am I doing about this?

(more…)

Are trivial plans cached?

It is sometimes said that trivial execution plans are not cached and queries that have such plans are compiled on every execution. So is that true? To effectively answer this question, we must first establish what a trivial plan is.

A trivial plan is essentially a plan for a query where a specific plan will always be the most optimal way of executing it. If we consider something like SELECT * FROM SomeTable then there’s only one real way to execute it, a scan of the cluster/heap.

The trivial plan is somewhat of a query optimiser optimisation. If the query qualifies for a trivial plan (and there are lots of restrictions) then the full optimisation process doesn’t need to be started and so the query’s execution plan can be generated quicker and with less overhead. The fact that a query has a trivial plan at one point doesn’t necessarily mean that it will always have a trivial plan, indexes may be added that make the selection of plan less of a sure thing and so the query must go for full optimisation, rather than getting a trivial plan

Nice theory, but how does one tell if a particular query has a trivial execution plan? The information is found within the execution plan, the properties of the highest-level operator has an entry ‘Optimisation level’ For a trivial plan this will read ‘TRIVIAL’

Trivial plan

(more…)

The most optimal join type

What’s the best join type for a query? Should we aspire to seeing nested loop joins in all our queries? Should we tremble with horror at the sight of a hash join?

Well, it depends. 🙂

There’s no single join type that’s best in every scenario and there’s no join type that’s bad to have in every scenario. If one of the join types, say the much maligned hash join, was very much a sub-optimal join type in every single scenario, then there would be no reason for it to be in the product and no reason for the optimiser to ever select it for a plan. Since there are three join types, and the optimiser can and does use all three, we must assume that they are all useful under some circumstances.

I took a look at the joins a while back, but it’s worth revisiting.

The nested loop join

A nested loop join is an optimal join type when two conditions are true.

  1. One of the resultsets contains quite a small number of rows.
  2. The other table has an index on the join column(s).

When both of these are true, SQL can do a very efficient nested loop. The smaller resultset becomes the outer table of the join, a loop runs across all the rows in that resultset and index seeks are done to look up the matching rows in the inner table. It’s important to note that the number of seeks against the inner table will not be less than the number of rows in the outer table, at the point the join occurs

If the one resultset has a small number of rows but there is no index on the other table on the join column, then a loop join can still be done, but is less optimal as the entire of the inner table (or a subset based on another filter condition) must be read on each iteration of the loop.

If both resultsets have large numbers of rows but there is an index on the join columns in one of the tables then the nested loop can still read through one of the resultsets and do index seeks to locate matching rows, but the number of rows in the outer table will mean lots and lots of seek operations, which may result in a sub-optimal plan.

(more…)

PASS Day 3 and post-con

Once the pre-con was done I dropped into the SQLCAT sessions on consolidation and virtualisation. Late, because I was cleaning up at the blogger’s table (Adam wasn’t there that day). Interesting session, the best-practices were worth noting down and their graphs on performance were encouraging.

After lunch I was intending to go to Allen’s clustering session, but it was over my head, so I switched to Buck’s SQL manageability presentation

Buck Woody’s session is a laugh-riot. Anyone he knows within view gets picked on. and various groups are getting insulted. Insulted in that session

  • Me
  • British people
  • South Africans
  • Developers
  • U2 fans
  • Baptists
  • Microsoft
  • Elderly people

All in good humour, no serious insult intended.

I missed the next session, chatted with Brent Ozar. Last of the day was my session on statistics. Audience was a little tired, but the session went quite well. I asked about table sizes and someone claimed a 100 billion row table. I really do want to know what kind of data they’re storing that generates a table that size.

The post-con day was quiet, in comparison with the days before. I split time between a post-con on semi-structured and unstructured data and some insider sessions. Some cool stuff under discussion in the insider sessions and lots of good debate. The post-con was useful, the part on full-text the most. It’s not something I’ve played much with in the past.

That’s PASS over and done for another year. Same time, same place, next year.

PASS Keynote – Day 3

Keynote opened with announcements regarding the PASS board. There was a very emotional speech from Wayne Snyder as the board said goodbye to Kevin Kline, as he steps down from the PASS board after 10 years.

I first met Kevin when I attended the European conference in 2005 (in Munich). I didn’t know anyone and I didn’t know very much. Kevin was at the opening party and he was talking to everyone there, making sure that they felt welcome and that they were engaged in the conversation. There was no feeling that he was faking interest, he was genuinely interested in what people were doing with SQL.

The changes to the board are, as well as the new board members, Wayne Snyder becomes "Immediate past president" and Rushabh Metah takes the role of President.

After the board announcement we get the keynote ‘tax’, long, boring discussion by a Del person. Configuration management, consolidation. Some waffle about disaster recovery, but without any meat. Or maybe consolidation. Haven’t quite worked it out. If this was a session, he’d be talking to an empty room.

David DeWitt’s presentation is looking at the historical and future trends of database platform changes. The improvement in CPU power way outstrips the improvement in disk speed. Hopefully SSDs will fix that.

The disk trends that he’s discussing are scary. Relatively speaking (considering the size of the disks today), drives are much slower than they were 20 or so years ago. Sequential reads are faster than random by a greater factor than some years back. Random reads hurt.

CPUs are faster, waay faster than they were years ago. Accessing memory takes more cycles than historically, like 30x more.

The way databases are currently designed, they incur lots of L1 and L2 cache misses. L2 cache miss can stall the CPU entirely for 200 cycles. This is why it’s so hard to max out modern processors. They spend so much of their time waiting. What makes it worse is that the cache lines are only 64 bytes wide. If a row is more than 64 bytes wide, moving a row from one cache to another will require more than one cycle and potentially more than one cache-miss. Changing database architecture to a column-store may alleviate this.

Compression is a CPU/disk tradeoff and that’s fine. CPUs are 1000x faster now and disk only 65x faster. Hence use some of those CPU cycles to help the poor disks along.

Some very interesting discussions on compression with column-stores. Store the data multiple times in different orders. David: "After all, we need to do something with those terabyte drives"

Run-length encoding works so well with a column store, especially if the columns are stored ‘in order’. Dictionary compression is good if the columns are not in order.

Hybrid model is also an option. Some of the benefits of the two, some downsides of the two. Lot of academic papers on those options. Search Google Scholar if interested

Some photos from David’s presentation

DiskTrends

DiskChanges

CPUArchitecture 

DSCF0231

PASS Main conference – Day 2

Too tired to post much right now.

I skipped the keynote this morning, the BI just doesn’t interest me that much. Also missed the morning session to hit the vendor hall, since I couldn’t last night.

The MVP book signing occurred around midday. Must have been around 200 books to sign. Funniest part was the comments from the other authors, putting a sql-related spin on the long queue and pile of books. Overheard “Looks like the input buffer’s filling up down there”

Also skipped the first of the afternoon sessions to sit in the “Ask the Experts” area and chat with a couple of the devs on topics around statistics. There was something I needed to clarify before my session tomorrow. Unfortunately the dev who worked on that are wasn’t there. Hopefully there’ll be some mail around lunchtime tomorrow with the answer.

My session on indexing went ok. Not fantastic, but ok. Will see if stats goes better tomorrow.

For the last session I sat in on a discussion on xquery. Starting to understand how that works, need to play more though. Don’t really agree on the ‘use xml for tables with changing column definitions’ though.

Photos tomorrow.

PASS Main conference – Day 1

So, today was the first day of the main conference and suddenly there’s 4x the number of people around.

One thing I do have to say straight off, congrats to the conference centre for actually having the wireless network working on the 1st day of the conference. Has to be the first time this has happened in the years I’ve been attending.

The keynote wasn’t as interesting as it could be. No real surprise, there’s no product announcements, no astounding features to demo (other than stuff that’s been seen before), no launch to be done. It was a discussion of where we’ve come from and where we’re going (including into the cloud) and then some demos of R2.

Bob’s level 500 session was a good way to start the conference. Maybe not as deep as the memory internals last year, but good, solid info. Useful stuff there.

The “Birds of a feather” lunch seemed to go down well. I chatted about execution plans for over an hour. Didn’t really get too deep into exec plans, other than discussing the difference between estimated and actual plans (none other than the run-time information) how statistics affect plans (I’m doing a full hour-long session on that on Thursday). Anyone interested in reading exec plans, see my blog series here or Grant’s e-book (available from SQLServerCentral)

After lunch there was a session on PowerPivot (formerly known as Gemini) and on scaling SQL beyond 64 processors. Interesting to see what kind of changes were needed to the SQLOS to handle that.

How’s these for servers?

ScalabilityProof

Big Server

PASS 2009 – Pre-con

PASS 2009 is underway!

It’s getting to the point that these conferences feel like a family reunion (where the family members are people that you like). Last night, coming to registration there was a whole crowd of people that I know from previous conferences, newsgroups, blogs and forums. I’m not going to name everyone I ran into last night, simply because I’ll forget someone.

There were a lot of insider sessions today, but the only one that I attended was the introduction. Most of the topics were ones I wasn’t overly interested in. Rather spent the day at  Adam’s CLR pre-con. CLR is an area that I’ve barely looked at and have little experience with, so it was a good choice.

Some very interesting info on CLR table valued functions. Apparently there’s no temp table/table variable involved. The CLR function streams the values back one row at a time. Might well remove the impact that the multi-statement table valued function has. Needs testing.

Late morning was spent at a Chapter Leaders meeting. Some of the resources available I didn’t know about. Things to look at when I get home. Also need, next month, to write up a report for the regional mentors on the meetings that we’ve had and what the attendance numbers look like. Something else to add to the to-do list.

Jack Corbett was at the meeting. One more person that I’ve chatted with online located. Ran into Steve and Bob right after lunch. Good to see Steve again. Will run into Bob later today and chat.

Some of the comments on CLR and Attention events was interesting. Basically, the CLR doesn’t receive attention events, so a timeout or a SSMS stop can result in orphaned CLR objects that will hang around for a while until the garbage collector comes across the object and collects it.

Very interesting sample code for collecting system performance counters using a CLR proc (unsafe CLR proc because it uses dlls that aren’t on the safe list.)

CLR Triggers. Just say no. Well, if it’s a ddl trigger with xml manipulation of the EventDate() maybe. Possibly. Perhaps.

Or maybe not. SQL DDL trigger with a CLR function to do the xml manipulation.

Some very interesting discussion of Code Access Security, Host Protection Attributes, Trustworthy and signed assemblies. It’s possible to have safe assemblies calling unsafe (external access) assemblies with the minimum amount of code marked unsafe (external access), have the database untrustworthy and still have everything working. Lots and lots of work though.

Day ended with the opening party (and the quiz bowl) followed by the SSC party. Good fun all.

Some photos from the day

Blythe cookies

WeirdPerson

Quizbowl 1

QuizBowl 2

Quizbowl 3

QuizBowl 4