SQL Server

SQL 2008 training

I’ve run into a bunch of free SQL Server 2008 training over the last few days.

Execution plan operations – joins

It’s about time I picked this series up again.

I’m not going to go into too much detail on joins. There are some very good articles elsewhere on joins. The important thing to notice about joins, in the context of an execution plan, is that there are six logical join operators and three physical join operators. The logical operators are what you ask for in the context of the query, the physical operators are what the optimiser picks to do the join.

The six logical operators are:

  • Inner Join
  • Outer Join
  • Cross Join
  • Cross Apply (new in SQL 2005)
  • Semi-Join
  • Anti Semi-Join

Craig Freedman wrote a long article on the logical join operators – Introduction to Joins

The semi-joins are the exception, in that they cannot be specified in a query. Nonetheless, they are present in disguise. They’re the logical operators for EXISTS, IN, NOT EXISTS and NOT IN. They’re used when matching is required, but not a complete join.

(more…)

Temp tables and table variables

I’ve encountered a fair bit of confusion on various forums as to the differences between temporary tables and table variables. As a quick article (I’m knee-deep in some AI stuff at the moment) I thought I’d quickly go over some points on temp tables and table variables.

Temporary Tables

  • Created using the Create table syntax, preceding the table name with a’#’ for a local temp table and ‘##’ for a global temp table
  • Allocated storage space within the TempDB database and entered into the TempDB system tables 1
  • The table’s actual name is the name is was created with, a large number of underscores and a hash value, to prevent object name collisions if two connections create a temp table with the same name
  • Can have a primary key, defaults, constraints and indexes (however the names of these are not hashed, possibly leading to duplicate object errors for constraints and defaults)
  • May not have triggers.
  • Foreign keys are permitted, but are not enforced
  • Have column statistics kept on them. The algorithm for determining when to update is different to permanent tables
  • Exist until they are dropped, or the connection closes.
  • Are visible in any child procedures called from the one where the table was created. Are not visible to parent procedures
  • Are not persisted to disk unless there is memory pressure, or the table is too large to fit in the data cache

(more…)

Parameter sniffing

This seems to come up again and again on the forums.

At its heart, parameter sniffing is the ability of the SQL Server optimiser to know the values of parameters passed to a stored proc at the point that it compiles the procedure. The idea is that if the parameter values are known, then the appropriate column statistics can be used and the optimiser can estimate the number of rows that the various query operators will have to process for various different possible execution plans.

Since the approximate number of rows is known, the cost of each possible plan can be more accurately calculated and a more accurate execution plan can be selected.

So, why is parameter sniffing so often a problem? Well, mainly, because parameter values do change.

(more…)

SQL Server 2008 Nov CTP

The latest SQL Server 2008 CTP is available on Connect.

Notable features new in this CTP:

  • Intellisense in Management studio. (About bloody time). This was demo’d at PASS back in Sept.
  • Change tracking. What we’ve done all these years with triggers, just without the triggers.
  • The Performance warehouse. This I really want to play with, even if it’s just to see what pieces I can use on 2005 or 2000 servers.
  • Plan freezing. To be able to freeze the plan cache at a point in time, and to move query plans from one server to another. I see this as a fairly advanced feature, probably only really useful in specific cases. (and probably a feature that will be badly abused)
  • Resource Governor. This must be my number one anticipated feature for 2008. The governor allows the creation of limitations and priorities based on properties of the connection, eg login name, application name, host name. It’ll be a very nice way of limiting the damage that ad-hoc queries do to a server’s performance, without stopping them outright.
  • Geospacial data types and functions.
  • The filestream data type

Now I just need to install virtual PC on my laptop and it’s playtime…..

Execution plan operations – scans and seeks

Another post in my ongoing series on reading execution plans. I know I’m jumping around a bit. I hope it makes some kind of sense.

I thought I’d quickly go over the seek and scan operations that can be seen in execution plans. There are 6 main ones. There’s a fair bit that I’m glossing over in this. I’ll get into some details at a later date.

Scans

  • Table scan. This operation only appears for a heap (table without a clustered index). The first page in the heap is located based on info in the system tables, and then the pages are read one by one, using the next and, if necessary, previous pointers in the page headers. This is generally an expensive operation and should be avoided where ever possible
  • (more…)

DateTime Manipulation

The date time data type and the date time functions within SQL are things that I see coming up time and time again in news groups and forums. Questions on how to get rid of the time, how to get the first day of the week, the last day of the month and so on. With the new Date and Time data types coming in SQL 2008, things will get easier, nut until then we have to do things the hard way.

In systems I’ve worked on I’ve seen several implementations of functions to find the first and last day of a week, a month or a quarter. Some have worked well, some have worked and others, well, haven’t

(more…)

Memory and SQL 2005 SP2

Or “Why are all my processes waiting on memory. There’s tonnes of memory

It’s probably not new news that there was a fairly nasty memory-related bug in SQL 2005 RTM and SP1 that was related to the relaxing of limits on cache size. Specifically the TokenAndPermUserStore cache.

On systems with large amounts of memory (20GB+) and frequent ad-hoc queries or significant usage of dynamic SQL, the cache can grow quite large, and by quite large I’m talking upwards of 2GB. I think I saw the cache at close on 8GB at one time on one of my servers.

The problem with this is that is takes quite a bit of time to search through several GB of cache to find the required tokens. Making matters worse, access to that cache is synchronised, so only a single thread may have access at a time.

The main symptom of that problem is lots of CMEMTHREAD waits without an apparent wait resource and a higher than normal CPU usage.

But that problem was fixed in SP2 with a change to the caching behaviour. Right?

(more…)

Indexes for aggregates

It’s well known that indexes on columns used in where clause and for joins is a good thing in SQL, but what about other places. How about on aggregates?

Consider a simple table with an amount and a customerID. It’s a common requirement to calculate the total amount that each customer has paid. No conditions are enforced, so this would seem like a place where an index won’t help. Well, let’s see. (sample code at end)

The clustered index (and hence the physical order of the rows) is on the identity column.Take the following query.
SELECT CustomerID, SUM(Amount) FROM Payments group by customerID

(more…)