Latest Posts

How not to do a Masters degree

There are many, many guides to successfully completing a post grad degree, so I am not going to add to them. Instead, this is a list of things, based on personal experience, that you can do to make it somewhere between very difficult and impossible to complete that Masters or Doctorate degree.

To be clear, I’m talking about the degrees which are mostly, if not entirely, based on research and a dissertation, not coursework.

Do the degree while holding down a full-time job

Initially this looks like it’ll work out fine. Work Monday to Friday, work on the degree on Saturday and Sunday. For the first few months it does work fine.

But there’s one thing that a dissertation requires and that’s a large amount of dedicated time. Time to read the literature. Time to come up with the hypothesis or research questions. Time to design an experiment. Time to conduct that experiment. Time to revise the experiment, conduct it again, revise again, conduct again… Time to analyse the results. Time to write up the results. Time to edit, rewrite, edit, rewrite, edit…

As the months pass, and as the initial enthusiasm and fun fades, so it becomes harder and harder to spend most of the weekend on the research, weekend after weekend after weekend, for a couple of years

Sure, it’s possible to complete a Masters degree while working full time, but it’s like playing a game on Insane difficulty level.

Decide that you want to get heavily involved in the SQL community

The SQL community are a great bunch of people and there’s a lot of encouragement to jump in and get involved, for many good reasons

Posting on forums is great in many ways, it boosts your confidence, it gets you recognitions and it’s a good way to get to know features you’ll never use in your regular job.

Blogging and writing are a great way to share knowledge, and there’s a thrill from watching the page hit count go up, from the first comment, from the complements, especially when your blog post gets referenced by others as the authoritative article on a subject.

But it takes time. Lots of time. Articles can take days of work, blog posts can take anything from a few minutes to many hours depending on the subject and the depth of the post. Presenting takes lots and lots of prep time. An hour-long presentation can easily require a day of prep, and that’s once you’ve done several presentations. The first one can take many days of writing slides, rewriting slides, writing demos and rehearsing the presentation several times. Forums will take every minute you’re willing to give to them and more, and there’s the constant temptation of ‘just one more post…’

And where’s that time going to come from? The time that would otherwise have been spent on the research and dissertation.

Get burned out, and don’t seek help

I’m probably going to get flak for this, but it has to be said.

As an industry in general, we are too reluctant to ask for help. I don’t mean technical help (though that too in many cases).

We are too eager to put on a pedestal the person who works 12 hours a day, 6 days a week, the person who pulls all-nighters on a regular basis, the person who never takes vacation because he ‘doesn’t have time’. We’re conditioned to see extreme hours worked as a sign of dedication, of what is needed to get ahead.

It’s not heroic. It’s not required. It’s not something to be admired.

It’s stupid.

Long days are sometimes required, weekend work is sometimes necessary, but they both should be the exception, not the norm. Excessive overtime, if needed to meet deadlines, should be followed with a discussion on what went wrong such that the overtime was required. Was the estimation inadequate? Was the project analysis flawed? Did the scope creep (or gallop)? Were people working on multiple projects at the same time while the project plan expected them to be dedicated? Were assumptions not met or essential infrastructure delayed?

If overtime is frequently required, then management has failed at their job. A developer working 12 hours a day, 6 days a week for months on end is not a sign of dedication or heroics, it’s a sign that someone, somewhere in management is not doing their job properly.

This kind of workload and times are not sustainable. They lead to mistakes and buggy code, they lead to burnout and they lead to people, top people, walking away and never coming back.

I wrote about burnout a few years ago. Looking back now, after another recent brush with it, I don’t think I went far enough in what I said there.

It is not a flaw, it is not a weakness, it is not admitting failure, to seek professional help for burnout. It’s far better to do so than to suffer for years with the effects.

And to get this back on topic, trying to work through burnout is like trying to swim through syrup. Lots of effort, little progress, very easy to give up and stop trying. That’s not going to do wonders for that research and the couple-hundred page dissertation that needs to be done.

Despite all of that, in case anyone is wondering, graduation is mid-September

2011 Book review

Another year over and much as I did last year, I’m going to briefly go over the books I read this last year.

I will freely admit, very few of these could be considered ‘classic literature’, most is a mix of sci-fi, fantasy or adventure fiction. That’s just what I like to read.

Book total this year was 53, up from the 45 I managed in 2010 and above the 50 that I aimed for. Part of this is that I travelled more (and hence had time with nothing to do but read), part is because I took a couple of small vacations (and spent time reading) and part is due to getting an iPad and loading a couple of book apps on there.

The iPad is never going to replace real, physical, paper books for me. I love the smell of new books, the feel of the book (and you can’t read an iPad in the bath without significant risk). That said, it is convenient when travelling and for carrying a few books easily. It’s especially nice when getting books from Amazon. 6 week shipping vs immediate delivery. No contest there.

My choice for best books of the year:

  1. First Lord’s Fury (Codex Alera) by Jim Butcher. This is the climax of the Alera series and definitely the best of the bunch. Fast moving, tense, full of action and altogether an excellent ending for an excellent series. One thing I really like about this one: It doesn’t end with ‘happily ever after’.
  2. Elantris by Brandon Sanderson. This is a bit of a surprise. I got this on sale without too much in the way of expectations. Not to say I don’t like Brandon Sanderson, I’ve enjoyed everything of his that I’ve read, but this was his first published book and so I was willing to give it a little leeway. Not necessary. Good characters (though I’m sure I recognise that headstrong princess from a few places), good plot without too many holes and an intriguing mystery that all comes together logically in a way that leaves you saying ‘But, of course that’s the problem’. Definitely recommend and looking forward to more of his work.
  3. Star Trek: Destiny (trilogy). Yes, I’m recommending Star Trek novels. The world has not ended. I find most Star Trek novels are quickly churned out, mediocre novels. Average writing, average plotting (at best) and usually a reset button to return the universe to the way it was at the end. This trilogy is none of those. The plot works, it’s intertwined over three books and about four time-periods and the crew of at least four ships, and it works. It also leaves the universe dramatically changed (in a way that I did not foresee coming). Finally it’s one of the few time travel tales I’ve read that doesn’t leave me cringing.

Sooo… books per month.


You can almost see from that which months I was travelling or on holiday. June – trip to UK and a few days at leisure. Oct – trip to Pass and lots of time to read while travelling. Nov – Week away in the middle of nowhere.

Lastly, books per genre. Yes, I read a lot of fantasy. (note, these links go to the library pages on this blog, there’s a link to the Amazon page from there)

Science Fiction

  1. The Long Night of Centauri Prime (Babylon 5: Legions of Fire, Book 1) by Peter David
  2. The Light of Other Days by Arthur C. Clarke
  3. Star Trek: Destiny #3: Lost Souls by David Mack
  4. Star Trek: Destiny #2: Mere Mortals by David Mack
  5. A Confederation of Valor (omnibus) by Tanya Huff
  6. Star Trek: Destiny #1: Gods of Night by David Mack
  7. Earthfall (Homecoming) by Orson Scott Card
  8. Deathstalker by Simon R. Green
  9. Childhood’s End by Arthur C. Clarke
  10. Songs of Distant Earth by Arthur C. Clarke
  11. The Time Machine (SF Masterworks) by H. G. Wells
  12. The Call of Earth by Orson Scott Card


  1. The Phoenix Transformed (The Enduring Flame) by Mercedes Lackey, James Mallory
  2. Nightingale’s Lament (Nightside, Book 3) by Simon R. Green
  3. Elantris by Brandon Sanderson
  4. The Gunslinger (The Dark Tower) by Stephen King
  5. Agents of Light and Darkness (Nightside, Book 2) by Simon R. Green
  6. The Dragon Token (Dragon Star, Book 2) by Melanie Rawn
  7. Something from the Nightside (Nightside, Book 1) by Simon R. Green
  8. A Calculus of Angels (The Age of Unreason, Book 2) by J. Gregory Keyes
  9. Stronghold (Dragon Star, Book 1) by Melanie Rawn
  10. The Crystal City (The Tales of Alvin Maker, Book 6) by Orson Scott Card
  11. Guards of Haven: The Adventures of Hawk and Fisher by Simon R. Green
  12. The Last Olympian (Percy Jackson and the Olympians, Book 5) by Rick Riordan
  13. The Battle of the Labyrinth (Percy Jackson and the Olympians, Book 4) by Rick Riordan
  14. The Titan’s Curse (Percy Jackson and the Olympians, Book 3) by Rick Riordan
  15. The Sea Of Monsters (Percy Jackson and the Olympians, Book 2) by Rick Riordan
  16. The Lightning Thief (Percy Jackson and the Olympians, Book 1) by Rick Riordan
  17. Heartfire (The Tales of Alvin Maker, Book 5) by Orson Scott Card
  18. Alvin Journeyman (Tales of Alvin Maker, Book 4) by Orson Scott Card
  19. Prentice Alvin (The Tales of Alvin Maker, Book 3) by Orson Scott Card
  20. Red Prophet (Tales of Alvin Maker, Book 2) by Orson Scott Card
  21. Seventh Son (Tales of Alvin Maker, Book 1) by Orson Scott Card
  22. Rides a Dread Legion: Book One of the Demonwar Saga by Raymond E. Feist
  23. First Lord’s Fury (Codex Alera) by Jim Butcher
  24. Issola (Vlad Taltos) by Steven Brust
  25. Turn Coat (The Dresden Files, Book 11) by Jim Butcher
  26. Hawk by Simon R. Green
  27. Taliesin by Stephen R. Lawhead

Other Fiction

  1. Robert Ludlum’s The Lazarus Vendetta: A Covert-One Novel by Robert Ludlum, Patrick Larkin
  2. Robert Ludlum’s The Altman Code: A Covert-One Novel by Robert Ludlum, Gayle Lynds
  3. The Bourne Identity: A Novel by Robert Ludlum
  4. The Eyre Affair: A Thursday Next Novel (Thursday Next Novels (Penguin Books)) by Jasper Fforde
  5. Lords of the Bow by Conn Iggulden
  6. Robert Ludlum’s The Cassandra Compact: A Covert-One Novel by Robert Ludlum, Philip Shelby
  7. Robert Ludlum’s The Hades Factor: A Covert-One Novel by Robert Ludlum


  1. Three Roads to Quantum Gravity by Lee Smolin
  2. Expert SQL Server 2005 Development by Adam Machanic, Hugo Kornelis, Lara Rubbelke
  3. Notes From a Small Island by Bill Bryson
  4. 19 Deadly Sins of Software Security: Programming Flaws and How to Fix Them (Security One-off) by Michael Howard, David LeBlanc, John Viega
  5. The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition (2nd Edition) by Frederick P. Brooks
  6. On the Shores of the Unknown: A Short History of the Universe by Joseph Silk
  7. Warrior Soul: The Memoir of a Navy Seal by Chuck Pfarrer

SQL University: Advanced Indexing – Indexing Strategies

Right, I know it’s Friday and everyone’s tired and looking forward to the weekend, but I do need to finish off this indexing section and I’ll try to keep this short and interesting and hopefully keep everyone awake.

There’s no shortage of information available on how to create indexes. Hell, I’ve written a copious amount myself. Most of these many articles however are written from the point of indexing single queries. What you chose for a where clause, what has to go into the include to create the perfect index for this query. Now that’s all well and good, but I’ve never met a system that had only one query per table (maybe there is such a system out there, but I’ve never found it)

So what I’m going to try to do today is address the topic of a strategy for indexing. How to approach indexing, not for a single query, but for the system as a whole. I won’t be able to cover this in-depth, this is material worthy of an entire book chapter, if not an entire book, but I can at least touch on the essential portions.

Now, there’s two main positions that we could be in when considering indexing strategies for an entire system
1) A brand new system that’s still in development
2) An existing system that’s being used actively.

One at a time…

Indexing strategies for a brand new system

Start by choosing a good clustered index. What makes a good clustered index? Well, it depends 🙂

The clustered index is the base, it will affect each and every nonclustered index, and it’s not trivial to change once the system is in use, so chose carefully. I’m not saying another word on the subject of a clustered index, not today.

Once that’s done…


SQL University: Advanced Indexing – Filtered Indexes

Welcome back to day 2 of Advanced Indexing. Today we’re going to look at a feature that was added in SQL Server 2008 – filtered indexes.

In versions previous, indexes were always on the entire table. An index would always have the same number of rows as the table it was built on did (which is why COUNT(*) can just scan the smallest index on the table)

With filtered indexes, it’s possible to have an index that’s built on a subset of the rows in the table. The definition for a filtered index contains a WHERE clause predicate that determines if a row in the table will be in the index or not.

This can be a major advantage on really large tables where most queries are only interested in a small fraction of the table. A normal index would be based on the entire table regardless of the fact that most of the table is of no interest, meaning the index would be larger than necessary, deeper than necessary and take up more space than would be ideal. With a filtered index on just the interesting portion of the table, the index size is kept to a minimum, meaning it’s shallower than an index on the entire table and hence more efficient.

A simple example of a filtered index would be

ON Account (AccountNumber)
WHERE Active = 1;

There are two main uses of a filtered index:
1) Enforcing moderately complex uniqueness requirements
2) Supporting queries that filter on common subsets of a table

Filtered indexes and unique columns

One very interesting use of filtered indexes is in enforcing uniqueness over portions of a table. One requirement that come up again and again is to have a nullable column that must have unique entries in it, but whose entries are optional. Basically, the column must be unique or null. Sounds easy, but the problem is that a unique index allows only one null. So much for nulls not being equal to anything including other nulls.

Prior to SQL 2008 implementing such a constraint required computed columns, indexed views or triggers. With SQL 2008’s filtered indexes, it’s trivial.


SQL University: Advanced indexing – Sorting and Grouping

Good day everyone and welcome to another week of SQL University. I know we’re getting close to the end of the year and everyone’s looking forward to a nice long vacation soaking up the sun at the beach, but a little bit of attention would be nice. Thank you.

This week is Advanced Indexing, and I mean advanced, none of that selectivity, SARGable, predicate stuff that gets repeated all over the place. If you need a refresher on the basics before we get started, the following can be considered pre-requisite reading for this course

There’s also some additional background material available for more enthusiastic students:

Right, now that the admin has been handled, let’s get straight into things. Nothing like starting at the deep end…

Most people would rightly associate indexes with where clause predicates and joins, after all, the main usage of an index is to reduce the rows in consideration for a query as fast as possible. However there’s another portion of your queries that indexes can, if appropriately designed, help with – grouping and sorting.

Sorting is an extremely expensive operation, especially on large numbers of rows. For the academics in the audience, the algorithmic complexity of sorting is above linear, the time to sort a set of data increases faster than the number of items in the list. The common sorting algorithms have an average time complexity of O(n log n). It’s better than O(n2), but it can still hurt at the higher row counts.

O(n^2) O(n log n)

O(n2) on the left, O(n log n) on the right (Thanks to QuickMath)

Right, the non-academics can wake up now.

The other reason that sorting hurts is that it needs a memory grant to do the sort. If there’s a shortage of memory the query could have to wait a while for the memory to be granted and, if the optimiser mis-estimates the number of rows to be sorted, the memory grant could be insufficient and the sort would have to spill into TempDB. You don’t want that happening.

Finally, sort is a blocking operator in the execution plan (all rows must have been fetched from the source before any output can be returned), and so the client won’t start seeing rows returned until the entire sort has been completed. This can make the query feel like it takes longer than it really does.

Grouping and aggregation are much the same. To aggregate one set of values based on another set of values, SQL has to get all the like values of the grouping columns together so that it can do the aggregation. That sounds suspiciously like a sort doesn’t it?

SQL doesn’t always sort to do aggregations, but the alternative – hash tables – isn’t exactly free (homework exercise – read up on hash tables)

So for both sorting and grouping, the query processor’s job would be a lot easier if there was some way that it could get the data ordered by the sorting or grouping columns without having to do the work of actually sorting. Sounds impossible? No.


24 Hours of PASS Questions

I finally found the time to work through the questions from the 24 Hours of PASS session that I did. Thanks to everyone that attended the event

Q1: Can you filter execution plans for sort warnings?

No. The sort and hash warnings don’t appear in the execution plan. You’d have to trace for the hash and sort warning events and correlate that with either batch/statement started and completed events or with the run-time plan events.

Q2: To get the query executions can you just add the statement completed or batch completed events?

The Statement Completed and Batch completed events have durations and can be filtered on that duration. The problem however is that the execution plan events (eg showplan all, showplan xml, statistics profile, statistics xml) have no duration column. Hence the execution plan events can’t be filtered on duration even though the statement_completed and batch_completed events can.

Q3: If you run a Profiler trace and all the plans are being pulled from cache will that mean that no ShowPlan event data will be shown in the trace?

Depends which event is being traced. There are events for query compile (showplan all for query compile and showplan xml for query compile) that only fire when the query compiles, so those will not fire if the plan is being pulled from cache. The other execution plan events are all fired each time the query executes.

Q4: Why can you sometimes get nulls or blank strings for the query plan from sys.dm_exec_query_plan

From Books Online:

Under the following conditions, no Showplan output is returned in the query_plan column of the returned table for sys.dm_exec_query_plan:

  • If the query plan that is specified by using plan_handle has been evicted from the plan cache, the query_plan column of the returned table is null.
  • Some Transact-SQL statements are not cached, such as bulk operation statements or statements containing string literals larger than 8 KB in size. XML Showplans for such statements cannot be retrieved by using sys.dm_exec_query_plan unless the batch is currently executing because they do not exist in the cache.

Q5: Do the execution plans from the DMVs contain execution information?

No. The plans extracted from cache contain compile-time information only.

If you enjoyed this and are going to be at PASS Summit this year, it’s not too late to sign up for the all-day precon (All about Execution Plans) that Grant Fritchey (blog|twitter) and I are doing.

On hiatus

I’m taking a break from blogging while I finish off my thesis. With the pile of outstanding things that need doing and the deadlines for them, something has to give, and I figure the blog is the least impact of the lot.

Other than possible 24 Hours of PASS and PASS Summit feedback and some promised SQL University posts, the next blog post here will be a discussion of how not to go about doing a research degree while working full-time, which will be posted sometime after I get the thesis submitted.

Compiles and recompiles

I want to spend some time over the next few months looking at query compilation and the plan cache, and there’s a couple concepts that I want to get cleared up beforehand. The first of those is around two terms that are often used interchangeably, compile and recompile.


A compile occurs when a query is given to the query optimiser and, when it does a lookup into the plan cache, no matching plan is found. The optimism must then compile the query, generating an execution plan, must add that plan to the plan cache (in most cases) and then must pass that plan onto the query execution engine so that the query can be executed. (


A recompile is something slightly different. For a recompile, the optimiser must find a matching plan when it queries the plan cache, must hand that cached plan over to the query execution engine and then while doing validation checks the execution engine must determine that then query plan is no longer valid and request the optimiser to partially or completely recompile the query. (

Subtle difference. Both cases result in the optimiser generating an execution plan, but the reasons can be different. Also worth noting is that a compile results in a new plan in the cache, a recompile simply replaces an existing plan.

Another difference since SQL 2005 – a compile is for the entire batch, but a recompile can be for just a single statement within the batch.

Now the theory’s dealt with, let’s look at some examples and see how we can track these two events and try and get a better understanding of which occurs when and how they look.

The tools I’m going to use to track these are performance monitor with the compiles/sec and recompiles/sec counters and SQL Profiler with the event SP:StmtRecompile event (there’s no profiler event for compilation). I’ll also check what’s in the plan cache after each test.

The first one’s going to be very simplistic, a query run against an empty plan cache.


EXEC dbo.OutStandingTotalByStatus

What we get from that is a non-zero value for SQL Compilations/sec (perfmon) and the following from profiler (The SQL Recompiles/sec remains 0)


and the plan cache now contains one plan with one use. (for more info on how the CacheMiss and CacheInsert events work, see and


In this case, I hope it was clear, we had a compile occur (empty plan cache before, new plan added to cache).

Now what happens if, with no clearing of the cache nor anything else being done, I mark that procedure for recompile and run it again?


All about Execution Plans

Coming to the PASS Summit in October this year? Excellent!

I say excellent, because Grant (blog|twitter) and I are presenting a full-day seminar on the Monday, all about execution plans. Excited already? If not, let me give you a taste of what we’ll be discussing.

Grant will be kicking things off with a look at how the optimiser works. Not 500 level deep internals (that’d take all day by itself), but an overview of how the optimiser turns this

SELECT t.Title, u.UserName, u.UserLevel, t.LastModified, t.LastPoster
FROM Threads t
INNER JOIN Posts p on t.ThreadID = p.ThreadID
INNER JOIN Users u on p.Poster = u.UserID
WHERE t.ThreadID = 42

into this


as well as some DMVs that give a view on what the optimiser is doing.

After that, I’ll be digging into what the plan cache is, how it’s structured and how plans are matched to incoming queries. I’ll also cover how you might go about monitoring SQL’s usage of the cache.

After that we’ll discuss the myriad and varied ways to get execution plans out of SQL and what the various different options for that return and what some of the terms involved really mean (estimated execution plans anyone?).

Once all that’s out there (which will probably take the entire morning) it’ll be onto the fun of reading the execution plans, what those various icons represent and what all their arcane properties are telling you. Maybe, just maybe we’ll also have a look at the raw XML form of the plan, just for fun.

And because theory’s not very useful without something to apply it to, we’ll be ending off the day by seeing how you can use the information from the execution plan to tune your queries. After all, everyone wants better performance, right?

Still not completely convinced? Then check out the 24 hours of PASS coming in September. Both Grant and I are doing a pre-con preview. I’m doing extracting execution plans and Grant’s doing a portion of the reading execution plans section.

Right, now that everyone’s dying to attend this, I’m going to toss out a little bit of a challenge. If you’re coming to our precon and you have an execution plan that you don’t understand (maybe a strange operator, maybe a property with a value you’ve never seen), mail it to me (gail@<blog domain>) with a note as to what you want clarifying and we might discuss it during the pre-con. Some rules, just to keep things sane

  • No confidential stuff. I don’t want legal trouble, neither do you.
  • If the exec plan takes 5 minutes to open in SSMS, I’m deleting it.
  • If it’s multiple pages in SSMS, I’m deleting it.
  • I don’t promise to look at anyone’s plans, it depends how we’re doing on time.

Goodbye IsNumeric hell

A well overdue feature introduced in Denali CTP 3 is that of the Try_Parse and Try_Convert functions. These are great for dealing with the something that’s frustrated SQL developers for years – data type conversions.

Let’s imagine a rather nasty case, a file with some values in it that once imported into SQL (as character data) looks something like this:


Ewww… that’s a mess. Let’s see if we can identify which of the values can be converted into a numeric data type. The function prior to Denali for that was ISNUMERIC.

FROM BadNumerics   


Great, so other than the obvious one, they’re all numeric data. Time to get converting.

SELECT CAST(ANumber as Numeric(18,6))
  FROM BadNumerics

Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to numeric.

Err, so they’re numeric but can’t be converted to numeric. That’s fun. Maybe another of the numeric data types will work…