Execution Plans

Estimated rows, actual rows and execution count

It’s often said that a major discrepancy between estimated and actual row counts in a query’s execution plan is a sign of inaccurate statistics or a poor cardinality estimate and that it’s a sign of a problem. This is generally true, however there are places where the estimates and actual rows will differ, often quite dramatically, without it been a problem. The reason for this is that these two values show slightly different things.

Let’s take a look at an example. (table creation code at the end of the post)

select bt.id, bt.SomeColumn, st.SomeArbDate
from dbo.BigTable bt
inner join dbo.SmallerTable st on bt.SomeColumn = st.LookupColumn
where bt.id between 5000 and 5100

Estimated Actual discrepency

Estimated rows = 1, actual rows = 101. That’s a large discrepancy, but what caused it? It’s not out of date statistics (a usual cause) because the table has only just been created, so why is the estimation so far from the actual.

(more…)

Estimated and Actual execution plan revisited

After an interesting discussion on SQLServerCentral last week, I realised that the terms ‘estimated execution plan’ and ‘actual execution plan’ are perhaps a little bit misleading.

The only thing estimated about the estimated execution plan is the rowcounts, costs and row size. The plan itself isn’t an estimate. It’s not as if the optimiser, when asked for an estimated plan, does a less thorough job than when asked to compile a plan for a query’s execution.

The two forms of execution plan are better described as ‘execution plan with run-time information’ and ‘execution plan without run-time information’

When, in Management Studio, someone clicks the ‘display estimated execution plan’ button, the query is submitted to SQL Server, parsed and bound, algebratised and optimised just as if it was going to be executed. But the query is not executed, and as such, the plan when returned contains no run time information.

If there is a matching cached query plan, that cached plan is what’s returned and no optimisation is done. This can be seen by using profiler with the Cache hit, cache miss and cache insert events being traced.

When, in Management Studio, the query is run with the execution plan option enabled, the query is submitted to SQL Server, parsed and bound, algebratised, optimised and executed. The returned plan does contain the run-time for that specific execution, hence the plan contains things like ‘actual row count, actual IO cost’, etc

If there’s a matching query plan in cache then that cached plan will be used for the query’s execution and will be the one returned, though with the run-time information added

When a plan is cached, only the compile-time information is cached. The detailed run-time information on the actual number of rows and actual executions is discarded after updating the aggregated query stats. Hence, when you retrieve a query from the plan cache, it will not contain the run-time information. Consider a plan that’s been used 20 times. Which execution’s run-time information would it contain? Remember that there’s only one plan in cache per procedure.

Hence, a plan fetched from cache will be identical to the plan returned by requesting the estimated execution plan for a specific query (Providing there’s nothing happening to invalidate the cached plan)

Profiler can capture (depending on event) the plan without the run-time information or the plan with the run-time information. There’s a nice table in chapter 2 of one of Itzik’s books that shows the various profiler events, when they fire and what they return.

A Bookmark lookup, by any other name…

I think I may have confused some people by talking about bookmark lookups. I’ll attempt to clarify things.

The operator that I’m talking about is the one that fetches extra columns from the clustered index when the nonclustered index that’s used to retrieve the rows doesn’t have all of the columns required.

In SQL 2000, that operator appeared in the execution plan as a bookmark lookup and it appeared as follows:

In SQL 2005, the name was changed, and the bookmark lookup no longer appeared in the execution plan. In it’s place appeared a clustered index seek, joined back to the original index seek by a nested loop join. It appeared as follows (the highlighted operator is the ‘bookmark lookup’)

This change made it harder to see what was going on as clustered index seeks are usually considered ‘good’. The trick to see if it is actually a ‘bookmark lookup’ is to look at the objects involved. When the nonclustered index seek and the clustered index seek are both on the same table, then it’s a ‘bookmark lookup’

I don’t recall what this appeared as when the base table was a heap, not a cluster.

In SQL 2005 SP2, the name of the operator was changed again, now appearing as a key lookup (when the base table has a clustered index) or a RID lookup (when the base table is a heap). It now looks like this:

The thing to note is that it’s not the version of the server that’s important. The format of the XML for the exec plan has not changed since SQL 2005 RTM (I can and have created a .sqlplan file from SQL 2008 and opened that file in SQL 2005’s management studio).

It’s the version of management studio that affects how the execution plans are displayed. If the server is SQL 2005 SP3, but the client tools are still RTM, the bookmark lookup will appear as a clustered index seek. Another reason to patch the client as well as the server

I hope that clears up some of the confusion around the naming. So, in future, what should I refer to this as? A bookmark lookup? A Key lookup?

Execution plan – more properties

I ran across a few more properties visible in the exec plan that I thought would be useful to people. So, without further ado…

Some overall properties of the entire execution can be seen by selecting the highest-level operator in the plan (the one on the left-most, typically the select, insert, update or delete operator) and then opening the properties window

The first four items, the compild plan size, compile CPU, compile Time and compile Memory all refer to the optimisation process that the query went through. They indicate how much in the way of server resources was spent compiling this query and how large the cached plan is in memory.

This can be important when dealing with queries that recompile often or are very seldom reused

(more…)

An example exec plan

Back to the technical posts…

I’m going to take an example of a very simple execution plan that I used in a presentation a few weeks back and go over it in some detail, pointing out information that is available and what can be read from the plan.

The execution plan can be downloaded (in xml format) here – ExecPlanDem01.zip

The query that the plan came from is a very simple two table query. It’s not a very optimal plan, but that’s because I forced an index hint in order to generate a more interesting plan. Without the hint, it’s a simple exec plan with two index seeks and a nested loop join.

Overview

The first thing that can be seen from the execution plan is that most of the cost of the query is in two operations, a key lookup (formerly called a bookmark lookup) and a clustered index seek. The high cost of the key lookup is a good sign that the query is using an inappropriate index. (in a future post I’ll discuss using the exec plan to choose indexes)

(more…)

Execution plans – important properties

Time I wrote another piece on execution plans (see the first post for the full list)

So, now a quick look at some of the important properties of exec plan operators.

  • Estimated number of rows – This is the number of rows that the query optimiser estimates this operator will process. The estimate is based on the statistics and on other data available at time of compilation. It’s possible that this number includes a fraction, due to the way to optimiser does its estimates
  • Actual number of rows – This is the actual number of rows that were processed by the operator. This value is calculated by the query processor at execution time. An estimated execution plan will not include this. (more…)

Execution plan operations – misc

This is going to be the last post on the operators in an execution plan. I’m not going to cover all the other operators. There are a lot of operators, many quite specialised or only appearing in specific scenarios. Conor Cunningham’s writing a series covering the deep, dark details of some of the operations

So onto the operators. I’m going to cover Sort, Concatenate and Scalar Function.

Sort

Sort has three logical operators:

  • Sort
  • Distinct Sort
  • Top-N Sort

Sort is used to satisfy any ORDER BY that is specified in a query, if the index used does not naturally sort the rows in the desired way. It also may appear in the execution plan before a merge join or a stream aggregate. Sometimes SQL may decide that a sort followed by a merge join or stream aggregate is cheaper than a hash join or hash aggregate.

Distinct sort is used for DISTINCT, sometimes for UNION and sometimes for GROUP BY where no aggregates are specified.

(more…)

Execution plan operations – aggregates

On to aggregates.

I’m not going to dwell too long on the logical operators for aggregates, as they aren’t as interesting as the joins were. Rather I’m going to concentrate on the physical operators involved.

The two physical operators used for aggregates (and a couple of other logical operations) are:

  • Stream Aggregate
  • Hash Aggregate

Stream Aggregate

The stream aggregate is the faster and more efficient of the aggregate operators. For it to be used, the input rowset must be sorted in order of the grouping columns.

(more…)

Execution plan operations – joins

It’s about time I picked this series up again.

I’m not going to go into too much detail on joins. There are some very good articles elsewhere on joins. The important thing to notice about joins, in the context of an execution plan, is that there are six logical join operators and three physical join operators. The logical operators are what you ask for in the context of the query, the physical operators are what the optimiser picks to do the join.

The six logical operators are:

  • Inner Join
  • Outer Join
  • Cross Join
  • Cross Apply (new in SQL 2005)
  • Semi-Join
  • Anti Semi-Join

Craig Freedman wrote a long article on the logical join operators – Introduction to Joins

The semi-joins are the exception, in that they cannot be specified in a query. Nonetheless, they are present in disguise. They’re the logical operators for EXISTS, IN, NOT EXISTS and NOT IN. They’re used when matching is required, but not a complete join.

(more…)

Execution plan operations – scans and seeks

Another post in my ongoing series on reading execution plans. I know I’m jumping around a bit. I hope it makes some kind of sense.

I thought I’d quickly go over the seek and scan operations that can be seen in execution plans. There are 6 main ones. There’s a fair bit that I’m glossing over in this. I’ll get into some details at a later date.

Scans

  • Table scan. This operation only appears for a heap (table without a clustered index). The first page in the heap is located based on info in the system tables, and then the pages are read one by one, using the next and, if necessary, previous pointers in the page headers. This is generally an expensive operation and should be avoided where ever possible
  • (more…)