Latest Posts

SA SQL Usergroup May meeting

The May meeting of the South African SQL Server Usergroup will be held Tue 19 May, 18h30 at the Microsoft Offices in Bryanston, 3012 Willam Nicol Drive. This month we have two speakers. Cedric Labuschagne and Clinton Scholtz.

Cedric will be presenting “Performance Monitoring using Management Studio (SSMS) – Introduction to the Management Data Warehouse”

Understand the architecture of how SQL Server 2008 collects performance and diagnostic data and presents the information in Management Studio.Develop a plan for enabling performance data collection and analyzing the results for troubleshooting and diagnostics.Learn how to create your own collection sets and manage the performance data collection warehouse database.

Clinton will be presenting “The Basics and Fundamentals of Service Broker”

Please let me know if you’re coming, I need numbers for catering purposes.

Look forward to seeing you there.

High Availability != Backups

Backup solutions are not the same as high availability solutions. They have different purposes and picking the wrong one for the wrong reasons may be disasterous.

A backup solution is a one that creates separate copies of data that can be used to restore the original in the case of data loss. A high availability solution is one designed to keep a system usable and available in the face of hardware (or similar) failures

Solutions like RAID, database mirroring, clustering and SAN replication are forms of high availability. They are not backup solutions and cannot replace good database backups. Having the database on a RAID array will not help if the DB becomes corrupt. SAN replication or database mirroring is of no use if a user somehow deletes critical records.

On the other hand, a backup strategy, however good, is not going to help much when a critical piece of hardware burns out and the business will be losing millions if the app’s not up within 10 minutes. If the app is critical, both backups and an appropriate high availability solution have to be considered. Picking the right high availability solution is complex. Entire books have been written on it.

The only time having no backups at all is an acceptable option is if it’s a system where the data can be completely recreated from another source with no difficulties (example here would be a reporting database that’s replicated directly from a different server)

Here’s another take on backups – http://www.sqlskills.com/BLOGS/PAUL/post/Importance-of-having-the-right-backups.aspx

Functions, IO statistics and the Execution plan

It’s no secret that I’m not overly fond of most user-defined functions. This isn’t just a pet hate, I have some good reasons for disliking them. All too often they’re performance bottlenecks, but that can be said about many things in SQL. The bigger problem is that they’re hidden performance bottlenecks that often go overlooked and ignored for too long.

I’m going to start with this fairly simple scalar function, created in the AdventureWorks database

Create function LineItemTotal(@ProductID int)
returns money
as
begin
declare @Total money

select @Total = sum(LineTotal) from sales.SalesOrderDetail where productid = @ProductID

return @Total
end

So, given that function, the following two queries should be equivalent.

SELECT productid, productnumber, dbo.LineItemTotal(productid) as SumTotal
FROM Production.Product p

SELECT productid, productnumber,
(select sum(LineTotal) from sales.SalesOrderDetail where productid = p.productid) AS SumTotal
FROM Production.Product p

No problems so far. They both return 504 rows (in my copy of AW, which has been slightly padded out with more data). Now, let’s look at the execution characteristics by running them again with Statistics IO and Statistics Time on.

Query 1, the one with the scalar function:

Table ‘Product’. Scan count 1, logical reads 4, physical reads 0.

SQL Server Execution Times:
CPU time = 47297 ms,  elapsed time = 47541 ms.

Query 2, the one with the correlated subquery:

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0.
Table ‘SalesOrderDetail’. Scan count 3, logical reads 22536, physical reads 0.
Table ‘Product’. Scan count 3, logical reads 40, physical reads 0.

SQL Server Execution Times:
CPU time = 1047 ms, elapsed time = 1249 ms.

(more…)

Great Leaders

I got tagged by both Brent and Grant on Chris Shaw’s latest quiz.

Who has been a great leader in your career and what made them a great leader?

I think the person who stands out the most is my previous manager. I’m not going to name him, and I’m not going to drone on for pages as to why I think he qualified, I’m going to give a couple of examples of things that happened.

A couple years back I realised that I was so busy keeping up with my work load that I wasn’t finding time to keep up with new developments, new ways of doing things, new releases, etc.

I went to my manager and explained to him that I was supposed, as part of my job reponsibilities, to keep up with new developments and share that information with the rest of the team, if it was applicable. However the pile of optimisation work was overwhelming. Would he mind if I put aside half an hour a day to just read books, blogs, support articles, etc, even though I had so much other work piled up that it would take a couple months to finish it all. His immediate reply, “Are you sure half an hour’s enough?”

Another time I was sitting in his office summarising the cause of a DB outage. It had been a deployment problem if I recall, it shouldn’t have happened. One of the high level business managers stormed in and demanded to know who he should shout at for this. Manager’s reply, “You shout at me.”

We all got shouted at later, but by our manager, not by the business person and it was more a case of “This should not have happened, it won’t happen again, will it?”

When I decided to leave that company, the hardest part was telling my manager that I was leaving. It turned out that he resigned two months after I left, for very much the same reasons.

Hmmm, now, who to tag?

I think Paul Randal and Christopher Stobbs

Developers vs DBAs

I was listening to a podcast on nHibernate this morning. Most of it was very good, but there was one section that absolutly made me see red.

The guest on the show was asked how the DBAs felt about the use of nHibernate for a new app on an existing (old) database. The reply essentially was ‘We haven’t told them and we’re not going to tell them.’ Apparently the plan was to only tell the DBAs about this app if there was a problem with it in production

Now there’s a whole lot of problems with that scenario and I can sum most of them up in one word – Trust.

If an application was mostly to be written in C#, but there was a critical portion that depended on (say) Biztalk no one would seriously suggest keeping the biztalk expert on staff in the dark and doing it completely themselves. So why do developers do that with databases?

I’ve often heard developers complain that the DBA doesn’t trust them. Guess what, pull this kind of stunt and they’ve got a good reason for not trusting those developers. Go behind someone’s back, hide stuff from them and they’re not going to trust you. No big surprise there.

The whole ‘them and us’ attitude that a lot of developers and DBAs have is, quite frankly, stupid and highly counter productive. The DBAs are the ones responsible for the database portion of custom apps after they become production, they should have some input during the development phase whether it be on the database design, the stored procedures or just ensuring that the app will scale to the required production load.

So, in conclusion…

Developers, speak to the DBAs, get them involved in projects early. You may be surprised how much value they can add to the project and how much smoother things can go when everyone’s working together instead of fighting each other. Of course, if your DBAs are the arrogant, overbearing type that give the profession a bad name, if may not be that easy.

DBAs, if you know that the developers avoid you, hide projects and prefer to go their own way, first take a look at your own behaviour. Ask to be involved in the early stages of development. Chat with the developers, see if there’s anything that you can help out with or any areas that they need assistance. You may be surprised how many problems can be ironed out before they become problems in the production environment. Of course, if your developers are they type that triggered this rant, it may not be that easy.

</soapbox>

On Counts

Or “What’s the fastest way to count the rows?”

It’s a fairly common to need to know the number of rows in a table, the number of rows that match a certain condition or whether or not there are any rows that match a condition. There’s also a number of ways of doing so, some better than others. The problem being that counting is not a cheap operation, especially on big tables. It’s not as bad as a sort, but it still can be expensive.

So, given that, let’s take a look at some of the ways.

Querying the metadata

If all that’s needed is the number of rows in the table, and it’s not 100% important that the value be completely accurate all the time, the system metadata can be queried. In SQL 2000 and below, that info was in sysindexes. In 2005 and higher it’s been moved into sys.partitions.

SELECT OBJECT_NAME(object_id) AS TableName, SUM(rows) AS TotalRows
FROM sys.partitions
WHERE index_id in (0,1)
AND object_id = OBJECT_ID('TableName')
GROUP BY object_id

The advantage of this approach is that it is fast. Since it’s not actually counting anything and, in fact, isn’t even accessing the table that’s being counted, it’s the fastest way to get the count of rows in the table.

The disadvantage is it can only get the number of rows in the table and cannot consider any criteria at all. It also may not be 100% accurate, depending how and when the table’s rowcount metadata is updated by the SQL engine. (more…)

SA SQL Usergroup April meeting

The April meeting of the South African SQL Server Usergroup will be held Tue 21 April, 18h30 at the Microsoft Offices in Bryanston

This month, Gary Hope will be presenting on SQL 2008’s spatial features. As soon as I have a better description, I will update this post.

Please let me know if you’re coming, I need numbers for catering purposes.

Look forward to seeing you there.

PASS Summit submissions

I finally stopped dithering over the small details and get my submissions for the PASS conference in, with about 5 hours remaining to the original deadline.

Since it seems to be somwhat of a tradition this year to post the details (As Brent, Jack and Michelle have done), I’ll list mine.

Lies, damned lies and statistics

Ever wondered what the things called statistics are, why they’re important and what needs to be done to maintain them? If so, this session is for you!

In this session we’ll take a look at what statistics are and why SQL keeps them; at how SQL maintains them and under what circumstances that maintenance is insufficient; and we’ll look at the problems that result when they are inaccurate. Finally we’ll cover some suggestions and options around maintenance of statistics when the automatic maintenance is not sufficient.

Delving the depths of the Plan Cache

Do you want to want to know more about how your server is running? Do you want to be able to see how queries are running within the server, both now and earlier? If so, this session is not to be missed.

In this session we’ll take a look at some of the information that is available within SQL relating to query optimisation and query execution. We’ll look at some of the Dynamic Management Views and Dynamic Management Functions that expose data from the plan cache, some relevant Trace events and we’ll take a brief look at what Extended Events offers in this area.

Insight into Indexes

Are your indexes being used? Are there any that are taking disk space but aren’t contributing usefully to the server’s workload? Are there queries that could benefit from indexes that don’t exist?

In this session, we’ll be looking at the index-related dynamic management views to see what information SQL makes available on index usage and missing indexes, how that information can be used and what the shortfalls of those DMVs are

So now we get to sit until end of May and see who’s in and who’s not.

Dynamic SQL and SQL injection

When I wrote about the catch-all queries, someone asked why the dynamic SQL that I offered wasn’t vulnerable to SQL injection. I thought I’d go into the whys and the wherefores of that in a little bit more detail.

I’m just going to look at SQL injection from the aspect of dynamic SQL. The front-end code aspect has been dealt with hundreds of times, most recently here – http://www.simple-talk.com/community/blogs/philfactor/archive/2009/03/30/72651.aspx

The most important thing to realise with SQL Injection (and with all other forms of command injection) is that it requires that a user-inputted string be incorperated as part of a command that’s going to be executed. Not as part of a paramter value, but as part of the command itself.

Let me show you want I mean.

DECLARE @sSQL varchar(500)
SET @sSQL = 'SELECT * FROM sys.objects'

EXECUTE (@sSQL)

In this exeedingly simple example, there’s no possibility for SQL injection. There’s no user-inputted string that can become part of the command. Let’s look at two slightly more complex examples

Example 1:


DECLARE @inputParam VARCHAR(100) -- Assume this comes from user input
DECLARE @sSQL varchar(500)

SET @sSQL = 'SELECT * FROM '

IF @inputParam = 'Table1'
SET @sSQL = @sSQL + 'Table1'
IF @inputParam = 'Table2'
SET @sSQL = @sSQL + 'Table2'
IF @inputParam = 'Table3'
SET @sSQL = @sSQL + 'Table3'
IF @inputParam = 'Table4'
SET @sSQL = @sSQL + 'Table4'

EXECUTE (@sSQL)

Example 2:


DECLARE @inputParam VARCHAR(100) -- Assume this comes from user input
DECLARE @sSQL varchar(500)

SET @sSQL = 'SELECT * FROM ' + @inputParam

EXECUTE (@sSQL)

(more…)