This post, like last week’s, is based off the presentation I did to the DBA Fundamentals virtual chapter.
The request was for more details on the method I use to capture wait and file stats on servers, The methods are pretty similar, so I’ll show waits.
This is by no means the only way of doing it, it’s the way I do it.
Part the First: Capture job
This is the easy part. Into a job step goes the following:
INSERT INTO Performance.dbo.WaitStats SELECT wait_type as WaitType, waiting_tasks_count AS NumberOfWaits, signal_wait_time_ms AS SignalWaitTime, wait_time_ms - signal_wait_time_ms AS ResourceWaitTime, GETDATE() AS SampleTime FROM sys.dm_os_wait_stats WHERE wait_time_ms > 0 AND wait_type NOT IN (<list of waits to ignore>);
Schedule the job to run on an interval for a couple of days. I like to run it every 15 min, maybe every half an hour. I’m trying to get overall behaviour, not identify queries. If I need later to see what queries incur a particular wait, I can use an extended event session.
For the list of waits to ignore, I use Glenn’s list, the latest version found at http://www.sqlskills.com/blogs/glenn/sql-server-diagnostic-information-queries-detailed-day-14/
I run this no less than a day, preferably a week if I can. 2-3 days is normally what I get.
Part the Second: Analysis script
The analysis script does two things:
- Get the wait times within an interval
- Pivot them so that I can easily graph in excel
To see which waits I want to include in the pivot, I look at the 20 waits with the highest increase in the interval monitored (this requires that the server wasn’t restarted during it).
I’m not necessarily going to graph and analyse all of them, but it does help ensure I don’t miss something interesting (like, for example, high LCK_M_Sch_S locks every day between 08:00 and 08:45)
For the purposes of this post, let’s say the ones I’m interested in for a particular analysis are LCK_M_IX, PAGELATCH_EX, LATCH_EX and IO_COMPLETION.
To be clear, those are for this example only. Do Not copy the below code and run without specifying the waits you’re interested in looking at, or the results are going to be less than useless.
The first thing I want to do is add a Row_Number based on the times the wait stats were recorded, so that I can join and take the difference between one interval and the next. In theory it should be possible to do this with times, but the insert doesn’t occur at exactly the same time, to the millisecond, each interval, hence this would require fancy date manipulation. Easier to use a ROW_NUMBER
SELECT WaitType, NumberOfWaits, SignalWaitTime, ResourceWaitTime, SampleTime, ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval FROM dbo.WaitStats WHERE WaitType IN ('LCK_M_IX', ‘PAGELATCH_EX’, 'LATCH_EX', 'IO_COMPLETION');
Next step, turn that into a CTE, join the CTE to itself with an offset and take the difference of the waiting tasks, the signal wait time and the resource wait time.
WITH RawWaits AS (SELECT WaitType, NumberOfWaits, SignalWaitTime, ResourceWaitTime, SampleTime, ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval FROM dbo.WaitStats WHERE WaitType IN ('LCK_M_IX', ‘PAGELATCH_EX’, 'LATCH_EX', 'IO_COMPLETION') ) SELECT w1.SampleTime, w1.WaitType AS WaitType, w2.NumberOfWaits - w1.NumberOfWaits AS NumerOfWaitsInInterval, w2.ResourceWaitTime - w1.ResourceWaitTime AS WaitTimeInInterval, w2.SignalWaitTime - w1.SignalWaitTime AS SignalWaitTimeInInterval FROM RawWaits w1 LEFT OUTER JOIN RawWaits w2 ON w2.WaitType = w1.WaitType AND w2.Interval= w1.Interval + 1;
Last step, pivot the results. This will pivot and show the resource wait. Change the column that’s in the select and the pivot to show the others. It doesn’t matter what aggregation function is used because there’s only one value in each interval, so sum, avg, min and max will all give the same result (just, don’t use count)
WITH RawWaits AS (SELECT WaitType, NumberOfWaits, SignalWaitTime, ResourceWaitTime, SampleTime, ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval FROM dbo.WaitStats WHERE WaitType IN ('LCK_M_IX', 'PAGELATCH_EX', 'LATCH_EX', 'IO_COMPLETION') ), WaitIntervals AS (SELECT w1.SampleTime, w1.WaitType AS WaitType, w2.NumberOfWaits - w1.NumberOfWaits AS NumerOfWaitsInInterval, w2.ResourceWaitTime - w1.ResourceWaitTime AS WaitTimeInInterval, w2.SignalWaitTime - w1.SignalWaitTime AS SignalWaitTimeInInterval FROM RawWaits w1 LEFT OUTER JOIN RawWaits w2 ON w2.WaitType = w1.WaitType AND w2.Interval = w1.Interval + 1 ) SELECT * FROM (SELECT SampleTime, WaitType, WaitTimeInInterval FROM WaitIntervals ) p PIVOT ( AVG(WaitTimeInInterval) FOR WaitType IN ([LCK_M_IX], [PAGELATCH_EX], [LATCH_EX], [IO_COMPLETION]) ) AS pvt ORDER BY SampleTime;
And there we have a result that can easily be imported into excel (or R) and graphed or analysed further.
Hello Gail,
Fisrt of all thanks for your great post but I guess that your last query has a problem.
Instead of using “w2.WaitNumber = w1.WaitNumber + 1”
you would want to use “w2.Interval = w1.Interval + 1”.
Thanks again,
Pedro
Fixed.
Hello Gail,
I also want to thank you for this and all those incredibly interesting facts and information you share with us.
The WaitNumber/Interval typo still exists in the script second to the last. It seems you used wait_type (as from the DMV) in the first script to collect the data but waittype (without underline) in the subsequent scripts.
Nevertheless, thanks again – and sorry for poor my English. 🙂
Yes, I used waitype without the underscores in later queries. It’s coming from my table, not the DMV. Same reason it’s NumberOfWaits, not waiting_tasks_count.