Author Archive: Gail

SA SQL Usergroup – November

The November meeting of the SQL Usergroup (JHB) will be held on the 28th November (friday).Steve Simon, of State Street Corporation, New York, will be presenting a session titled “DTS 2000 migration to SSIS…makes financial sense !!”

While the migration of DTS 2000 packages to SSIS is relatively efficient utilizing the Package Migration Wizard, mutual fund and other financial packages are often more involved and thus require a bit more hands on effort. This “best practices”, hands-on presentation will show the attendee the practices that we have implemented to convert our existing “far east” packages to run efficiently and effectively within the SQL Server 2005 environment. The discussion will also include ways and means that we have undertaken to reduce as many back end scripts as possible by replacing them with more Control and Data Flow objects.

I don’t have a confirmed venue yet, but it should be the Microsoft office, Bryanston, 18h30. If those details change, I will update this post.

Please leave a comment if you intend attending.

Identifying inaccurate statistics

I wrote previously about statistics, what they’re needed for and I briefly mentioned what can happen when they’re inaccurate.

One question I’ve seen asked a few times is on how to identify stats that are inaccurate, what DMV to use. The bad new is that there is no DMV that identifies inaccurate statistics. The Stats_Date function can be used to see when the stats were last updated, but that doesn’t say if that was too long ago ort not. If a table is readonly, or is changed twice a year, statistics that are four months old are perfectly valid.

The rowmodcnt column in sysindexes can give an approximate count of the number of rows that have changed since the stats were last updated, although sysindexes is deprecated and will disappear in the next version of SQL and the rowmodcnt column is no longer completely accurate. The rowmodcnt however is just a count of changes. It gives no indication of whether that number is too high and causing a problem. In some cases 50% of the table may have changed before there’s a problem with queries. In other cases 5% is enough to cause problems.

(more…)

What are statistics and why do we need them?

Statistics are another area of SQL that seem to be fairly poorly understood, on what they are, how they are used and how they are maintained..

Statistics store information about the distribution of data in the columns. The information includes the density of the key columns which gives a measure of the uniqueness of the index, and a histogram that stores information on how the distribution of values within the column.

Creating statistics

When an index is created, statistics will automatically be created on the index columns. The stats on an index cannot be dropped while the index exists. If the index is a multi-column one, the density is stored for all columns, but the histogram is only stored for the leading column.

It’s also possible to have statistics that aren’t associated with indexes. These are created automatically by the query optimiser (providing auto_create_statistics is enabled) or they can be created manually with the CREATE STATISTICS statement.

It’s not usually necessary to create statistics manually as the optimiser usually knows best what stats are required.

Updating Statistics

SQL keeps a count of the number of changes to a column and will update the statistics when it deems sufficient changes to have occurred (providing auto_update_stats is enabled).The stats are not updates as part of a data modification. Rather the next select that requires those statistics will trigger the update. In most cases, the auto-update doesn’t cause problems. It runs in the read-uncommitted isolation level and hence won’t cause blocking and SQL will, by default, only sample a portion of the table to calculate the stats.

(more…)

Virtualization launch event

I got roped into speaking at the Microsoft SA Virtualization launch earlier this week (Thursday 23rd October). It went fairly well, though I do need to work on pacing for these kind of things. That said, I only had three days to prepare everything, so it didn’t go too badly. 😉

I did two hour long sessions on new features for developers in SQL 2008. Amoung the things covered were:

  • Data Compression
  • Datetime data types
  • Table types
  • Table parameters
  • Grouping sets
  • Merge
  • Sparse column
  • Filtered indexes
  • Geospacial data types and indexes
  • Filestream
  • HierarchyID

The slides are available if anyone wants them – What’s New For Developers. (2.5MB)

What is fragmentation?

It’s common knowledge that SQL indexes become fragmented over time and should be rebuilt to remove that fragmentation. But what exactly is fragmentation?

First, a little bit on index structure.

The index key defines the logical order of the index leaf pages. Each page has two pointers, one to the previous page and one to the next page in the index.

A hypothetical index on an integer column

In an ideal situation, the logical order of the pages would correspond to the physical order, and a page with higher index key values  would have a higher page number (corresponding to physical position in the file) than a page with lower index key values.

If that is the case for all pages in the index leaf level, then the index has no fragmentation. Unfortunatly, that is not normally the case. Inserts and updates to the index can cause page splits. A page split moves half the rows on one of the index leaf pages to a new page and then linkes that page into the chain. It’s unlikely that a free page will be in the correct physical logation in the file and hence after the page split, the physical and logical ordering of the pages will no longer be the same.

(more…)

PASS 2008 community summit

The (almost complete) schedule for the 2008 PASS conference is now available and there are some fantastic looking sessions.

Some of the highlights of the schedule (at least for me) are:

The big news (not that it’s exactly news by now) is that I’m speaking at the summit this year. I’m presenting a session on how to write badly-performing queries. The session is late on Friday afternoon and is immediately followed (in the same room) by Adam Machanic and Peter DeBetta’s session on Antipatterns and Malpractices. Looks like a fun afternoon focused on how things shouldn’t be done.

More exams

I wrote the second of my 2008 exams yesterday, doing the beta of the MCTS (Database Developer) cert. While writing it I realised I don’t know half as much as I should about XML, CLR, Service Broker and a couple of the newer additions to T-SQL.

All in all, it didn’t seem hard, but that’s hard to judge from a beta exam. I do think there was perhaps too high a focus on XML. About 10% of the questions I had were on that, but that may have been the luck of the draw.

I did the beta of the MCITP (Database Admin) about a month ago. It was a tough exam. No case studies like the 2005 exams had, but many of the questions felt like mini-case studies themselves. I had one question that was a screen and a half long. That was just the question. The answer list was over a screen log as well. All I’m going to say about that one is that you really, really need to know backup strategies to write it.

Next up, the MCTS (Database Admin). If I do that and if I pass the two betas (will only find out when the exams are officilly released) then there’ll only be one left to do, the ITP for DB Dev.

Execution plan – more properties

I ran across a few more properties visible in the exec plan that I thought would be useful to people. So, without further ado…

Some overall properties of the entire execution can be seen by selecting the highest-level operator in the plan (the one on the left-most, typically the select, insert, update or delete operator) and then opening the properties window

The first four items, the compild plan size, compile CPU, compile Time and compile Memory all refer to the optimisation process that the query went through. They indicate how much in the way of server resources was spent compiling this query and how large the cached plan is in memory.

This can be important when dealing with queries that recompile often or are very seldom reused

(more…)

South African SQL Server User Group revived

The first meeting of the revived South African SQL Server usergroup will be at 18:30 on Tuesday the 21st of October. The meeting will be held at the auditoriums at the Microsoft office, 3012 William Nicol Drive, Bryanston, Johannesburg. (directions)

I will be presenting a talk titled Effective Indexing. I’ll looking at the considerations for selecting clustered and non-clustered indexes, what should and should not be indexed and how queries affect the decision on indexing. If there’s time, I’ll also talk briefly on index maintenance. The talk is a slight modification of the one that I presented at TechEd South Africa this year.

The event is open to anyone interested in SQL Server. Please post a comment if you’re interested in attending.