Or “Which column goes first?”
There’s a common piece of advice given about columns in an index key that says that the most selective column should go first. I’m not going to say that’s incorrect, because it’s not. The problem is that it’s often given without any explanation as to why the most selective column should go first, nor are the other considerations for index key order mentioned.
This can lead to misunderstandings like, in the extreme case, where one person after hearing that advice went and added the primary key column as the leading column of every single nonclustered index (because it’s highly selective), and then wondered why his database performance decreased dramatically.
The comment about selectivity is because of the way SQL keeps statistics on indexes (see my post on statistics for more info on what they are). SQL only keeps the histogram for the first column of the index. That means that it only knows the actual distribution of values of the first column. If the first column is not selective, the index may not be used. However, that’s not the whole story.
SQL also, in addition to the histogram, keeps density values for all of the left-based subsets of the index keys. So, for a 3 column index key, SQL knows the density of the first column, of the first and second and of all three. The density is, in a nutshell, a value that shows how unique the set of columns is. It’s 1/(distinct values). The value can be seen for any index using DBCC Show_Statistics with the DENSITY_VECTOR option.
This means, while SQL only knows the actual data distribution of the first column, it does know, on average, how many rows will be returned by an equality match on any left-based subset of the index keys
So, what’s my rule for the order of columns in an index key? Put the most selective columns first, when all other considerations are equal.