Identifying inaccurate statistics

I wrote previously about statistics, what they’re needed for and I briefly mentioned what can happen when they’re inaccurate.

One question I’ve seen asked a few times is on how to identify stats that are inaccurate, what DMV to use. The bad new is that there is no DMV that identifies inaccurate statistics. The Stats_Date function can be used to see when the stats were last updated, but that doesn’t say if that was too long ago ort not. If a table is readonly, or is changed twice a year, statistics that are four months old are perfectly valid.

The rowmodcnt column in sysindexes can give an approximate count of the number of rows that have changed since the stats were last updated, although sysindexes is deprecated and will disappear in the next version of SQL and the rowmodcnt column is no longer completely accurate. The rowmodcnt however is just a count of changes. It gives no indication of whether that number is too high and causing a problem. In some cases 50% of the table may have changed before there’s a problem with queries. In other cases 5% is enough to cause problems.

The only certain way to identify statistics that sufficiently inaccurate as to cause problems is to see the effect on queries. The typical sympton is a query that runs well one day and terribly the next with no changes to schema and only normal changes to data. Identifying those cases requires that there’s a benchmark of how queries normally run and that it’s possible to identify abnormal performance.

If a stats update (sampled or with full scan) fixes the performance problem, then it confirms that the problem was indeed statistics.

A look at the execution plan of a badly-performing query can also identify statistics problems. If the estimated and actual rows affected by an index seek are very different (more than about 10%) then it can indicate that the statistics are old or inaccurate. I say can, because there are other reasons why the estimates are incorrect.

The truth is, whether stats are inaccurate or not depends more on the specific queries that are using the statistics than anything else. It’s even possible that two queries running against the table will be affected completely differently by old stats. One may run terribly and the other not be affected at all. It all depends on what the queries do and how the optimiser optimises them.

2 Responses to “Identifying inaccurate statistics”

  1. Hi, I just wanted to simply ask about other reasons why the estimates can vary from actual rows. I, for example, have often situation that there is estimate 1 row, and actual more than 30000, and in consequence there is nested loop. It seems that is, somewhat, optimal for this case, but I’m still concerned about this.
    And thanks for your articles, these are brilliant!

  2. Well first you need to check whether that is correct or not (http://sqlinthewild.co.za/index.php/2009/09/22/estimated-rows-actual-rows-and-execution-count/)

    It could be the use of a table variable (estimates 1 row), it could be a multi-statement table-valued function for the same reason. Multiple predicates can also result in incorrect estimates as the optimiser assumes they are independent, when they may not be. Use of a function in a predicate. Lots of possible reasons

Leave a Reply