All indexes are unique

Well, that’s a rather contentious title. There are probably several people shaking their heads at this point. Let me explain.

I was listening to a podcast with Kimberly Tripp this morning, and she mentioned this briefly. I thought it would be a good discussion to end a short series on indexes and selectivity.

The Clustered Index

A clustered index has to be unique, because the clustering key acts as the row’s location in the table. If the index is not defined as unique, SQL will make it unique by adding a uniquifier, a 4-byte integer that’s hidden behind the scenes and is added when necessary to make the clustered index unique.

It’s not documented anywhere clearly, but it is mentioned in a couple of places. From msdn:

If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users. It is only added when required to make the clustered key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.

So all clustered indexes are unique.

The Nonclustered Index

A nonclustered index contains, in addition to the index key and any include columns, a pointer to the actual row. This is so that the row can be retrieved when other columns are needed for a query (A bookmark lookup)

When the table has a clustered index, this pointer is the clustered index key. When the table does not have a clustered index, the pointer is the RID, a combination of file ID, page ID and slot index (which gives the row’s logical position on the page). These pointers are not just stored at the leaf level of the index, they’re stored at the higher levels as well, something that a bit of poking with DBCC Page can easily verify. (Unless the nonclustered index is defined unique, in which case it’s just at the lead level)

As was proven above, the clustering key is unique. The RID, since it points to the row’s actual position, is also unique. There’s no way that two rows can be in the same place on a page.

Hence, since part of the nonclustered index is unique, the entire index has to be unique.

So all nonclustered indexes are also unique.

Q.E.D.

9 Responses to “All indexes are unique”

  1. That’s a really weird way to think about it, but you know what, it’s utterly true. Well done.

  2. The main intention was to make people think about index structure. So many questions on indexing are simple if the structure if the index is understood.

  3. [...] All Indexes Are Unique – no, really.

  4. [...] Gail Shaw’s post All indexes are unique, reminded me of an interesting piece of trivia I discovered back in SQL Server 7. The error message [...]

  5. just came across this post, really opened my eyes to indexes (know it sounds very dramatic) :P

  6. Hmmm…

    This has been documented in SQL Server 7.0 courseware.

  7. It’s in the 2005/2008 documentation as well, if one is willing to read through lots of different topics.

  8. Hello – have been recently reading your posts, and wanted to ask a very rookie-like question…When is it good to use a NonClustered Index on the Primary Key of a table, over a Clustered one? I am about to deploy a new setup, and went through to verify that the Primary Keys on all of my tables were in fact – Clustered. They weren’t, and when I asked why the only explanation I got was s shoulder shrug. These are programmers that put it together.

    So needless to say – I changed them all to Clustered Index Primary Keys. Can you think of a reason or time when it would be a good thing to go with a NonClustered Primary Key?

    Thanks and keep up the great work!

  9. There are many times that it’s a good thing to have a nonclustered primary key. Main one is when there’s a better place for the clustered index (eg because of access path or data type(s) of the primary key)

    Unfortunately there is no simple, always applicable, straightforward answer here.

    Just make sure that if the PK is nonclustered, there is a clustered index somewhere else.

Leave a Reply