Latest Posts

Pass Summit 2016 abstract reviews

Following on Steve’s blog post on summit abstracts, I decided to publish mine.

My comments are not intended as an attack on the program committee, it’s an incredibly hard job that you couldn’t pay me to do (I’ve done similar work before and hated it). What I hope comes out of this, and the other posts which Steve reviewed, are that comments are more constructive and can be used to improve the abstract.

When the initial mails were sent out, I had one general session and one lightning talk accepted. Since then, I’ve been asked to also present one of the sessions that was listed as alternate.

So, without any editing, the abstracts and comments:

All about Indexes (half-day, level 300, declined)

Indexes are essential to good database performance, but it can be hard to decide what indexes to create and the 'rules' around indexes often appear to be vague or downright contradictory.

In this session we'll dive deep into indexes, have a look at their architecture and internal structure and how that affects the way that indexes are used in query execution. We’ll look at why clustered indexes are recommended on almost all tables and how their architecture affects the choice of columns. We’ll look at nonclustered indexes; their architecture and how query design affects what indexes should be created to support various queries.

Comments

Abstract starts off really well, but last sentence has some structure/repetition issues. Level might be appropriate, but I wonder if in a 3 hour session you can cover enough detail to justify a 300 level.

Nice abstract and overall flow between title, abstract and goals.

Abstract seems well written and gives decent insight into session contents. It would be better to mention 2014 & 2016 features to make the session more attractive. The topic of indexes is somewhat overdone and may have some difficulty attracting attendees. Goals are decent but somewhat generic and could benefit from being more tangible. 25% demo on this topic seems a bit low to keep attendees engaged for a 3 hour session.

Need more meat in abstract. Are you going to talk about column, filtered or other index types? Only 25% demos, you have 3 hours???

Could cover more breadth in 3 hours. Filtered, sparse, columnstore etc.

I don’t see this for a half-day session. The goal are just too weak. And only 25% demo’s? You mean you are going to be talking for 3 hours and demoing for an hour? zzzzz. Need to flesh this abstract out a lot more.

Subjective: For a 3h long session, even at lvl300 , would expect a bit more demos to keep the attendees from nodding off…

Interesting combination. One wonders if I can cover enough material in 3 hours to make it a level 300 and others imply that there’s not enough material to spend 3 hours on.

Thing is, I’ve given this presentation twice before, as a 3 hour session. Pass Summit in Charlotte and SQLBits in London. It fits into 3 hours, providing there aren’t too many questions, that is. If I was going to add clustered and nonclustered columnstores and Hekaton’s range and hash indexes (and the interactions between them, eg non-clustered rowstore index on a clustered columnstore), then it would be a full day precon at least.

There are no 2014 or 2016 features mentioned, because the rowstore indexes didn’t change much in either version. Sure, I could call out the increased key size, but that’s not exactly abstract material. Maybe a footnote on one slide.

As for the topic being overdone and will have difficulty attracting attendees, there are only two sessions on rowstore indexes this year, a precon (Kendra Little) and a 100-level general session (Kathi Kellenberger), plus a few internals presentations and performance-related presentations that include indexing. Every time I’ve given an indexing presentation at a SQLSaturday, Pass Summit or SQLBits I’ve had a packed room. Sure, it isn’t shiny and new, but it’s probably relevant to most systems we design, develop and administer.

“You mean you are going to be talking for 3 hours and demoing for an hour? “ <snark_mode = on> No, I planned to do the middle hour as a mime act. </snark_mode>

Last time I checked, a 3 hour session meant talking for 3 hours. I’m sorry that indexing sounds so dry and boring that attendees are sure to fall asleep during it. I haven’t yet had anyone falling asleep in my indexing sessions. Maybe everyone who did, did so quietly and I didn’t notice. Maybe I should bring fireworks next time.

Go, Go, Query Store! (general session, level 200, accepted)

One of the hardest things to do in SQL is to identify the cause of a sudden degradation in performance. The DMVs don’t persist information over a restart of the instance and, unless there was already some query benchmarking (and there almost never is), answering the question of how the queries behaved last week needs a time machine. Up until now, that is. The addition of the Query Store to SQL Server 2016 makes identifying and resolving performance regressions a breeze.

In this session we’ll take a look at what the Query Store is and how it works, before diving into a scenario where overall performance suddenly degraded, and we’ll see why Query Store is the best new feature in SQL Server 2016, bar none.

Comments

Topic: Great topic. SQL Server 2016 and new shiny features like QS will definitely draw a crowd.
Abstract: Very well written with clear supportive goals.
Subjective: This sounds like a great session I would like to attend. The one issue I have is that it is listed as only 25% demo. Demos are the best way to who off QS functionality. 50% demo would be better.

I would like to attend this session

The outline seems to clearly describe the contents of the presentation. The topic and goals should be compelling to attendees. The target audience should be big enough to support this session. There appears to be a reasonable amount of live demonstrations in relation to the topic being presented.

Abstract: detailed, clear
topic: relevant and new, one of the more interesting feature of sql server 2016
Subjective rating: a good session

Abstract: well written, good topic
Topic: good topic, eye catching
Subjective: good session, would be interested in this

Nothing much to say here. The session will likely be closer to 50% demos, the original plan was 25%, but when I built the slides for SQL Saturday Iceland, I ended up putting more demos in and less slides.

Is that a parameter I smell? (general session, level 300, declined)

All too often a forum post on erratic query performance is met with a reply &quot;Oh, it&#39;s parameter sniffing. You can fix it with &lt;insert random solution here&gt;.&quot; The problem with answer, even if it has identified the cause, is that it’s only part true. Parameter sniffing is not simply a problem that needs fixing; it&#39;s an essential part of well-performing queries. In most cases.

Come to this session to learn what Parameter Sniffing really is and why it’s a good thing, most of the time. Learn how to identify the scenarios where it’s not good, why a feature that is supposed to improve query performance sometimes degrades it, and what your options are for resolving the problems when they do occur.

Comments

Abstract: Well presented, explains purpose of session.
Topic: catchy title, still timely and relevant.
Subjective: Nice mix of demo for this topic.

title not cute, if someone does not know about this they would not come Good abstract and good goals

Abstract: Grammar is a bit rough making the abstract difficult to read.

topic is decent . while it not “latest” or “hot”, it address a real life problem that developers can face. Abstract is a bit too generic and could benefit from more tangible details. Goals are okay but somewhat generic. demo % is decent.

Thanks for submitting a session that covers a really hot button topic in database queries! Looking forward to seeing it.

Again, fair enough, nothing much I can say about these comments.

On Transactions and Atomic Operations (general session, level 200, declined)

If there’s one thing that we, as SQL developers, do, it&#39;s not use enough transactions.

Transactions are critical when multiple changes need to be made entirely or not at all, but even given that it’s rare to see transactions used at all in most production code

In this session, we&#39;ll look at what transactions are and why we should use them. We&#39;ll explore the effects transactions have on locking and the transaction log. We&#39;ll investigate methods of handling errors and undoing data modifications, and we&#39;ll see why nested transactions are a lie.

Comments

Abstract: Grammar in first sentence is difficult to read. Too many commas.

good topic, need more meat in abstract and goals.

topic may not be “latest” but its a foundation subject and of relevance to most database developers. abstract clearly conveys what to expect from the session. goals are somewhat terse and could benefit from tangible details. demo % is a bit low.

Atomic Operations is in the title, but not used/defined in the abstract itself.

Abstract: Second sentence is a bit run-on and incoherent.
Topic: Fit for purpose, but likely a bit niche, as the author already seems to fears.
Subjective: Would have expected at least 50% demo – after all the easiest way to understand the need for TRANS is to see data buggered up by colliding updates.

Yes, I do have a tendency to torture my commas (they always confess at the end) and play with grammar. I will try to remember next year to write with simple, plain grammar. I do need to flesh this out a bit for next time it’s submitted anywhere.

I’m not sure what in the abstract conveys my fear that it’s a niche session, I don’t think it is. From working on various clients’ production systems over the last few years, I very seldom see explicit transactions. It’s something we don’t do enough, not a niche topic.

I’m also not sure that the reviewer meant by ‘colliding updates’. Updates, like all statements, are always part of a transaction, even if it is just one automatically started and committed. If two sessions were to try and update the same column, same rows at the same time to different values, the outcome will be as if one or the other had run, not a mixture of the two. And wrapping the update in an explicit transaction won’t change that behaviour

The Many Latencies of TempDB (general session, level 400, originally alternate, later accepted)

TempDB gets a bad rap when it comes to performance and scalability, and it’s all-too-often well deserved. A badly configured TempDB can have devastating effects on throughput. Combine that with poor queries and, well, you didn’t have any plans for the weekend, right?

In this session we’ll look at some common causes of TempDB contention, both query-based and configuration-based. We’ll look at guidelines for configuring TempDB and when and why you’d make various changes, and we’ll cover more ways to monitor TempDB than you can shake a stick at.

Now, about those weekend plans…

Comments

Abstract: Clear and well written abstract. No prerequisite listed.
Topic: Topic and goals seem like they would be appealing to attendees.
Subjective: Sounds like a good session that people would benefit from.

Abstract: interesting
Topic: relevant
Subjecttive rating: compelling, high level

Abstract: The outline and details of this abstract are well written
Topic: This is a good topic
Subjective: I may attend this session

The session prerequisites are helpful. A great topic and the Abstract is almost good. It is funny but not enough information about the session.

Session prerequisites: TBC?
Goal 1: Correct the “&#39;”
Objective: I would like to attend this session.

The pre-reqs here were a mistake on my part. Despite checking and re-checking the abstracts, that mistake slipped through (tbc = to be completed).

I would love to have corrected the “&#39;”, as well as the &quot; and &gt; that appeared scattered through the abstracts, but they appeared on submit and editing afterwards didn’t help. Something, somewhere in the website messed up the HTML encoding a bit.

Watch a query run. Run, query, run! (Lightning talk, level 100, declined)

Previously if you wanted to get any run-time statistics for a query, you had to include the actual execution plan and run the query to completion. No more! New in SQL 2016 is the live query statistics that let you watch the execution of a query, in real time, and see when the operators run and where the data flows.

In this lightning talk we’ll look at how the live query statistics works and discuss some scenarios where this will really help debugging strange query behaviour.

Comments

Perfect topic/abstract for Lightning Talk, sign me up!

interesting and current topic. well written abstract with good details of session contents. Goals are terse but clear.

Sounds like a good topic for a Lightning Talk.

Great topic and good intro to query store. Hopefully this talk will provide concise knowledge.
Topic is good, will interest a large amount of attendess.
Level is excellent – it’s a new feature so 100 level is great. I would have liked to see mention of query store in the title.

good topic

New topic on new technology. Small enough to be interesting

Interesting and great topic. Abstract and Goals talks to each other. New features is always a good place to grab peoples attention.

The comment on wanting query store to appear in the title is mystifying, because this session has nothing to do with query store. It’s showing off the Live Query Statistics (which, while nice, is too small to warrant anything more than a lightning session alone)

Why Temporal Tables and Stretch Database Are Best Friends (Lightning talk, level 200, accepted)

SQL Server 2016 introduced, among several other new features, Temporal tables and Stretch database.

From a distance, they appear not to have much to do with each other. Temporal tables allow you to query the table as it was at a point in time. Stretch allows for tables to span the earthed SQL Server and the cloud.

In this short session we’ll look at why these two features work spectacularly well together and why a stretched temporal table makes perfect sense.

Comments

Clear understanding of the objective, and as a “new” feature people will drawn to this.

Perfect Lightning Talk abstract, focused topic and something new with the latest version of SQL Server.

Sounds interesting!

interesting topic – new and relevant. Abstract is brief but gives good insight into session contents. Goals are terse, but clear and tangible. No demos may be off-putting to some attendees.

Earthed SQL is an interesting expression and certainly made me think.
There’s two new features here – is that too much for a 10 minute lightening talk?
If good connection is made early between the two then this could be good
Three goals seem to sum up the presentation well.

Try just one topic in 10 minutes or create a 75 or 3 hour session on new features of 2016

No real “pull” for a short session.

“No demos may be off-putting”. Um, what? This is a 10-minute lightning talk, I’m not demoing features in 10 minutes and I’d be very surprised if attendees expected demos in a 10 minute session.

I can’t take credit for the “Earthed SQL” expression, that one’s from Rimma’s Keynote presentation in 2014

SQL Server 2016 features: R services

One of the more interesting features in SQL 2016 is the integration of the R language.

For those who haven’t seen it before, R is a statistical and data analysis language. It’s been around for ages, and has become popular in recent years.

R looks something like this (and I make no promises that this is well-written R). Taken from a morse-code related challenge

MessageLetters <- str_split(Message, "")

MessageEncoded <- list(1:length(MessageLetters))

ListOfDots <- lapply(lapply(c(MaxCharacterLength:1), function(x) rep.int(".", times = x)), function(x) str_c(x, collapse=''))
ListOfDashes <- lapply(lapply(c(MaxCharacterLength:1), function(x) rep.int("-", times = x)), function(x) str_c(x, collapse=''))

If you’re interested in learning R, I found the Learning R book to be very good.

SQL 2016 offers the ability to run R from a SQL Server session. It’s not that SQL suddenly understands R, it doesn’t. Instead it can call out to the R runtime, pass data to it and get data back

Installing the R components are very easy.

2016-05-23_09-39-43

And there’s an extra licence to accept.

2016-05-23_09-44-48

It’s worth noting that the pre-installed Azure gallery image for RC3 does not include the R services. Whether the RTM one will or not remains to be seen, but I’d suggest installing manually for now.

Once installed, it has to be enabled with sp_configure.

EXEC sp_configure 'external scripts enabled', 1
RECONFIGURE

It’s not currently very intuitive to use. The current way R code is run is similar to dynamic SQL, with the same inherent difficulties in debugging.

EXEC sp_execute_external_script
  @language = N'R',
  @script = N'data(iris)
    OutputDataSet <- head(iris)'
  WITH RESULT SETS (([Sepal.Length] NUMERIC(4,2) NOT NULL, [Sepal.Width] NUMERIC(4,2) NOT NULL, [Petal.Length] NUMERIC(4,2) NOT NULL, [Petal.Width]  NUMERIC(4,2) NOT NULL, [Species] VARCHAR(30)));
go

It’s possible to pass data in as well, using a parameter named @input_data_1 (there’s no @input_data_2) and from what I can tell from the documentation @parameter1, which takes a comma-delimited list of values for parameters defined with @params. There’s no examples using these that I can find, so it’s a little unclear how they precisely work.

See https://msdn.microsoft.com/en-us/library/mt604368.aspx and https://msdn.microsoft.com/en-us/library/mt591993.aspx for more details.

It’s not fast. The above piece of T-SQL took ~4 seconds to execute. This is on an Azure A3 VM. Not a great machine admittedly, but the R code, which just returns the first 6 rows of a built-in data set, ran in under a second on my desktop. This is likely not something you’ll be doing as part of an OLTP process.

I hope this external_script method is temporary. It’s ugly, hard to troubleshoot, and it means I have to write my R somewhere else, probably R Studio, maybe Visual Studio, and move it over once tested and working. I’d much rather see something like

CREATE PROCEDURE GetIrisData
  WITH Language = 'R' -- or USQL or Python or …
  AS
…
GO

Maybe in SQL Server 2020?

SQL Server 2016 features: Temporal Tables

Another new feature in SQL 2016 is the Temporal Table (or System Versioning, as its referred to in the documentation). It allows a table to be versioned, in terms of data, and for queries to access rows of the table as they were at some earlier point in time,

I’m quite excited about this, because while we’ve always been able to do this manually, with triggers or CDC or CT, it’s been anything but trivial. I remember trying to implement a form of temporal tables back in SQL 2000, using triggers, and it was an absolute pain in the neck.

So how does it work? Let’s start with a normal un-versioned table.

CREATE TABLE dbo.Stock (
  StockReferenceID INT IDENTITY(1, 1) NOT NULL,
  IssueID INT NULL,
  Condition VARCHAR(10) NULL,
  AvailableQty SMALLINT NULL,
  Price NUMERIC(8, 2) NULL,
  PRIMARY KEY CLUSTERED (StockReferenceID ASC)
);

To make that a temporal table, we need to add two columns, a row start date and a row end date.

ALTER TABLE Stock
  ADD PERIOD FOR SYSTEM_TIME (RowStartDate, RowEndDate),
    RowStartDate DATETIME2 GENERATED ALWAYS AS ROW START HIDDEN NOT NULL DEFAULT GETDATE(),
    RowEndDate DATETIME2 GENERATED ALWAYS AS ROW END HIDDEN NOT NULL DEFAULT CONVERT(DATETIME2, '9999-12-31 23:59:59.99999999');

It’s a little complicated. From my, admittedly limited, testing, the NOT NULL and the DEFAULT are required. The start time’s default needs to be GETDATE() and the end time’s default needs to be the max value of the data type used.

Hidden is an interesting property, it means that the columns won’t appear if SELECT * FROM Stock… is run. The only way to see the column values is to explicitly state them in the SELECT clause.

TemporalHiddenColumns_thumb.png

I wonder if that property will be available for other columns in the future. It would be nice to be able to mark large blob columns as HIDDEN so that SELECT * doesn’t end up pulling many MB per row back.

That ALTER adds the row’s two time stamps. To enable the versioning then just requires

ALTER TABLE Stock
  SET (SYSTEM_VERSIONING = ON);

Once that’s done, the table gains a second, linked, table that contains the history of the rows.

TemporalTable-Object-Explorer_thumb.png

421576540 is the object_id for the Stock table. If is also possible to specify the name for the history table in the ALTER TABLE statement, if preferred.

The history table can be queried directly. The start and end times aren’t hidden in this one.

TemporalHistoryTable.png

Or, the temporal table can be queried with a new clause added to the FROM, FOR SYSTEM TIME… Full details at https://msdn.microsoft.com/en-us/library/mt591018.aspx

HistoryOfRow

Very neat.

The one thing that does need mentioning. This is not an audit solution. If the table is altered, history can be lost. If the table is dropped, history is definitely lost. Auditing requirements are such that audit records should survive both. Use this for historical views of how the data looked, and if you need an audit as well, look at something like SQLAudit.

Upcoming conferences

It’s shaping up to a busy year for conferences, well busy by my standards that is. While I’m unfortunately missing SQLBits, I’ll still be getting a chance to enjoy an English summer.

InsideSQL

The InsideSQL conference is a new conference organised by Neil Hambly. It’s a deep-dive conference, with longer sessions than many conferences offer, and an opportunity to dig deep into topics.

I’m presenting two sessions there, first a look at SQL waits, why there are waits and what various waits types mean, second a nice deep discussion on SQL Server indexes.

SQLSaturday Iceland

Iceland has been on my to-visit list for some time, so a SQLSaturday there? Perfect excuse for a short visit and a bit of exploration. I wonder if there’s any chance I’ll get to see the Aurora Borealis.

I’m doing a full-day precon on the Thursday (Friday is an Icelandic public holiday) on execution plans, as well as a regular session on Query Store on the Saturday.

South African SQLSaturdays

Fast forward to September, the South African SQL Saturdays are again running on back-to-back weekends. Johannesburg on the 3rd of September, Cape Town on the 10th and Durban planned for the 17th.

We’d love to have more international speakers join us for these. The local weather is lovely in September, and the exchange rate to the dollar/pound so poor that you won’t believe how cheap things are here. Come down for a two week African holiday, and get three SQLSaturday presentations in on the side.

SQL 2016 features: Stretch Database

Stretch database allows for a table to span an ‘earthed’ SQL Server instance and an Azure SQL Database. It allows for parts (or all) of a table, presumably older, less used parts, to be stored in Azure instead of on local servers. This could be very valuable for companies that are obliged to retain transactional data for long periods of time, but don’t want that data filling up the SAN/flash array.

After having played with it, as it is in RC2, I have some misgivings. It’s still a useful feature, but probably not as useful as I initially assumed when it was announced.

To start with, the price. Stretch is advertised as an alternative to expensive enterprise-grade storage. The storage part is cheap, it’s costed as ‘Read-Access Geographically Redundant Storage’ blob storage.

PriceStorage

Then there’s the compute costs

PriceCompute

The highest tier is 2000 DSU at $25/hour. To compare the costs to SQL Database, a P2 has the same compute costs as the lowest tier of Stretch, and that’s with a preview discount applied to Stretch. It’s going to be a hard sell to my clients at that price (though that may be partially because of the R15=$1 exchange rate).

The restrictions on what tables are eligible are limiting too. The documented forbidden data types aren’t too much of a problem. This feature’s intended for transactional tables, maybe audit tables and the disallowed data types are complex ones. HierarchyID, Geography, XML, SQL_Variant.

A bigger concern are the disallowed features. No computed columns, no defaults, no check constraints, can’t be referenced by a foreign key. I can’t think of too many transactional tables I’ve seen that don’t have one or more of those.

It’s looking more like an archive table, specifically designed to be stretchable will be needed, rather than stretching the transactional table itself. I haven’t tested whether it’s possible to stretch a partitioned table (or partition a stretched table) in order to partition switch into a stretched table. If it is, that may be the way to go.

I have another concern about stretch that’s related to debugging it. When I tested in RC2, my table was listed as valid by the stretch wizard, but when I tried, the ALTER TABLE succeeded but no data was moved. It turned out that the Numeric data type wasn’t allowed (A bug in RC2 I suspect, not an intentional limitation), but the problem wasn’t clear from the stretch-related DMVs. The problem is still present in RC3

StretchDMV

The actual error message was no where to be found. The new built-in extended event session specifically for stretch tables was of no additional help.

StretchXE

The error log contained a different message, but still not one that pinpointed the problem.

This blog post was based on RC2 and written before the release of RC3, however post RC3 testing has shown no change yet. I hope at least the DMVs are expanded before RTM to include actual error messages and more details. We don’t need new features that are hard to diagnose.

As for the other limitations, I’m hoping that Stretch will be like Hekaton, very limited in its first version and expanded out in the next major version. It’s an interesting feature with potential, I’d hate to see that potential go to waste.

SQL Server 2016 features: Live query statistics

Ever wanted to look at a query’s actual execution plan (execution plan with runtime information) without waiting for the query to finish? Now you can.

LiveQueryStats

Enable that and run a query, and you get an execution plan immediately, one with a few more details in it than we’re used to.

LiveQuery

I’m not sure what the times on the operators show, because a constant scan wouldn’t take over 4 seconds, it generates a single number. I suspect it’s the time between the first row request and the operator returning ‘no more rows’, but I’d have to test more to be sure.

The percentage done is probably based on the estimated row count, because on this plan all the operators went to 99% done instantly. The two numbers underneath the % done are the number of rows currently processed and below that the estimated number of rows.

It’s clear there’s a severe row estimation problem here. The last nested loop join estimated 97 rows and, at the point the screenshot was taken, had processed 170500 rows. It’s a pity that the estimated executions isn’t shown as well, as it makes identifying row estimation errors slightly harder for operators that execute multiple times (eg the key lookup). http://sqlinthewild.co.za/index.php/2009/09/22/estimated-rows-actual-rows-and-execution-count/

Identifying this kind of problem is, I think, one of the primary uses for Live Query Stats (other that using it to see how the QP works). The old example of ‘it ran fast yesterday and slow today, why?’ can be at least partially answered with Live Query Stats. Run the query with it turned on and see what plan is used and what row counts are flowing through them. Combine that with Query Store’s history of what plans were used yesterday, and we have a very powerful way of identifying why something is running unusually slow.

However is it something that should be used sparsely in production, as it does add some  overhead.

LiveStatsOn

In most ways the Live Query Stats behaves like a normal execution plan, it can be saved and if saved part way through a query’s execution, the resultant file is a normal .sqlplan file and has actual row counts of the point in execution where it was saved.

It’s not a ground breaking new feature, but it’s a fun way to see how queries execute and it does have some uses in debugging sudden performance problems.

SQL Server 2016 features: Query Store

Given that SQL Server 2016 is coming ‘real soon now’, it’s probably well past time that I write up some thoughts on new features.

The first one I want to look at is a feature that I’m so looking forward to getting to use, the Query Store.

Query Store is essentially a flight recorder for a SQL database. It tracks queries, their execution characteristics and their execution plans. The best part is that this information is persisted into the database and hence is not lost on restart, as the current performance-related DMV contents are.

The data is aggregated over a defined period of time, by default an hour. This will probably be fine for most cases, if there are performance problems that come and go in fractions of an hour (like a case I had last year), then that interval may need to be reduced, depending on the type of problem.

There are lots and lots of blog posts on how to enable it and how to query it and the like, so I’m not going to repeat that info here. See http://sqlperformance.com/2015/02/sql-plan/the-sql-server-query-store for all of that kind of info.

There’s two main aspects that I want to discuss about this feature. Let’s start with a screenshot of one of the Query Store’s built-in reports, the top resource-consuming queries.

QueryStore

The longest-running (on aggregate) query in this database is this nice parameterised query against the Stock and Issues tables. In the pane at the bottom of that report I can see its query plans.

Nothing really fancy there, except for one thing.

The server that I pulled those reports from has never had that query run against it. I pulled that report from an Azure VM that I set up last week. The workload all dates from 3 weeks ago.

Query Store data is persisted into the database, which means it’s included in a backup. I ran the workload against one server, backed up the database, restored it elsewhere and then queried the query store data. And there’s an even better part.

QueryStoreNoTables

Before running those Query Store reports, I dropped every table in the database.

Before this, to get the aggregated performance characteristics of a workload, I’d have to run a server-side trace or an extended events session, write the results out to a file, copy the file to my analysis server, load them into a table and aggregate the results. It’s a process that could take a day or two.

Now (or at least once SQL 2016 becomes widespread) I can just ask the client for their latest backup as all the query performance data I want is in there. If they’re uncomfortable with me having access to the data, they can restore the DB somewhere, drop all the tables then back up the ‘empty’ database and send me that. Much faster, and far fewer worries about having potentially missed something that ran at a time the trace wasn’t running.

The second thing isn’t so much about Query Store, as it is about the message that’s coming from Microsoft about it. Over and over and over they keep talking about how Query Store can be used to force a good query plan. It’s not hard to do.

Let’s take an example of a query with a parameter sniffing problem.

CREATE PROCEDURE GetOrders (@StartDate DATETIME)
AS
    SELECT  OrderDate,
            IssueID,
            QtyOrdered,
            Total
    FROM    Orders
    WHERE   OrderDate > @StartDate;
GO

The index on OrderDate isn’t covering, so a seek and key lookup is optimal for small numbers of rows and a clustered index scan is better for large numbers of rows

BadQueryLogicalReads

The query in question is by far the worst query in terms of logical reads on the server.

QueryStoreMultiplePlans

And it has two plans associated with it. The one at the top (plan 453) is the seek and key lookup. The one at the bottom (plan 452) is the clustered index scan.

Select the desired plan. It appears in the bottom section, then

ForcePlan

And problem solved.

Well…

I’d argue that’s less fixing the problem than hiding it. Sure, you won’t get a different plan, or at least that’s the idea. (While testing I did manage to get a different plan to the forced one, I need to investigate further why.) But is that forced plan the best solution? In this example, widening the index would have been a better solution. In other cases you might rather want to split the procedure into two, one for some date values one for others, or add the recompile hint, or even change the query.

Forcing a plan is great for stopping a problem that’s currently bringing production down, but it’s far from the only thing that will ever be done now. For starters forcing a good plan requires that there is a good plan, and if the query is written so it can’t use indexes or there are no suitable indexes, there won’t be a good plan to force. It’s a nice tool, but that’s all it is, another tool in the performance tuning box. It’s not a replacement for all other tuning work that’s ever been done

Now, I wonder how long it’s going to take to get my clients to upgrade.

Blocking operators and actual row counts

Query plans can sometimes be hard to read, and other times can be downright mystifying.

Take this plan for example. Not too hard in general. Two index seek/scan, a join, a sort and a filter. The peculiarity here is in the actual row counts. We expect that a join can filter rows out, that a filter can, well, filter rows out, that a top can reduce rows, that any aggregation can reduce the row count.

SortActualRowsBefore

SortActualRowsAfter

But why is a sort operator, a normal sort, reducing the row count? The answer lies in part not in how rows flow through the query plan, but in how control flows through the plan, and in part in the types of operators in the plan.

First let’s look at the types of operators. Here I don’t mean joins and aggregates and the like, I’m referring to whether an operator is a blocking operator or a non-blocking operator.

A non-blocking operator is one that consumes and produces rows at the same time. Nested loop joins are non-blocking operators.

A blocking operator is one that requires that all rows from the input have been consumed before a single row can be produced. Sorts are blocking operators.

Some operators can be somewhere between the two, requiring a group of rows to be consumed before an output row can be produced. Stream aggregates are an example here.

The sort in the plan is a blocking operator, and hence it needs all rows from the operator before it, the loop, before it can output any rows. That’s the 2920 going in to it, but why is there only 50 rows coming out?

That’s down to the way a query executes. Starting at the top of the plan, the top operator, in this case a SELECT asks the operator beneath it for a row. If the requested operator isn’t one that can generate a row (eg an index scan), then it asks the operator beneath it for a row.

The query that generated the shown plan had a filter based on the generated Row_Number of RowNumber between 26 and 50. This filter was executed by the Filter operator and partially by the Top operator.

RestOfPlan

FilterPropertiesTopProperties[3]

The TOP is there because the filter is on a Row_Number, the resultset is sorted by the columns defined in the Row_Number’s order by and there’s no partition by. The row numbered 50 will be the 50th row in the resultset and after that point there can be no more rows that satisfy the predicate. The query processor knows this.

So, the first row is requested by the select. The Filter can’t generate a row so it asks the Top for a row, and so on down the plan until we get to the sort.

The sort can’t request one row from the operator below it, it’s a blocking operator, it has to fetch all the rows from the operator below it. All 2920 of them.

Once the sort has all the rows, it sorts them and returns one row back to the previous operator. Repeat for the next row and the next.

Let’s fast-forward a few rows. The filter has just returned row 50 to the select operator. Select asks for the next row, row 51. The filter asks the top for the next row. The top, however, knows that it was only supposed to return the first 50 rows, and so instead it tells the filter operator that there are no more rows. The filter passes that up to the select and the query end there.

Hence why we have a sort further down the plan that only outputted 50 rows. Not because it filtered the rows itself, but because it was a blocking operator and the operators above it only asked for 50 rows.

It’s important to be able to read the execution plan in both directions. Reading the plan right-to-left is reading it in the direction of the data flow. Reading it left-to-right is reading it in the direction of the control flow. To fully understand plans it’s necessary to be able to do both.

Obsessing over query operator costs

A common problem when looking at execution plans is attributing too much meaning and value of the costs of operators.

The percentages shown for operators in query plans are based on costs generated by the query optimiser. They are not times, they are not CPU usage, they are not IO. The costs that the query optimiser generates are unit-less numbers that it uses internally to estimate the relative expense of plans as it optimises a query.

For starters, the percentages of operators should add to 100 across the entire plan, so worrying that the only data access operator in a simple query plan shows 100% is useless. If there’s only one index seek/scan in the plan, of course it’s going to be close to 100% of the total cost of the plan, there’s no where else for the cost to go.

But that’s not all. The accuracy of these estimates is based, in part, on the accuracy of the row estimations. If the statistics are out of date or there are any other row estimation errors, then the costs and as a result the percentages shown will be wildly incorrect

Take, for example, this query plan.

ExampleExecutionPlan

According to the percentages, the two operators were of equal cost. But a key lookup is just a single-row clustered index seek by a different name, and if we look at the number of executions of the two operators, it’s clear that they cannot possibly have been the same cost to execute

IndexSeek  KeyLookup

An index seek to return 1 million rows and a million index seeks to fetch one row each are not going to take the same amount of resources to execute, and hence those percentages are completely misleading.

Because the percentages can easily be way out, focusing on them when performance tuning is potentially going to result in a lot of wasted time. There is no single value, counter, measure or result that’s going to by itself indicate the cause of performance problems. Obsessing over single data points, or focusing on changing a single data point is almost certainly going to waste time.

Oh, and if anyone still wants to attribute importance to the percentages…

CostPercentages

Monitoring wait stats

This post, like last week’s, is based off the presentation I did to the DBA Fundamentals virtual chapter.

The request was for more details on the method I use to capture wait and file stats on servers, The methods are pretty similar, so I’ll show waits.

This is by no means the only way of doing it, it’s the way I do it.

Part the First: Capture job

This is the easy part. Into a job step goes the following:

INSERT  INTO Performance.dbo.WaitStats
SELECT  wait_type as WaitType,
        waiting_tasks_count AS NumberOfWaits,
        signal_wait_time_ms AS SignalWaitTime,
        wait_time_ms - signal_wait_time_ms AS ResourceWaitTime,
        GETDATE() AS SampleTime
FROM    sys.dm_os_wait_stats
WHERE   wait_time_ms > 0
    AND wait_type NOT IN (<list of waits to ignore>);

Schedule the job to run on an interval for a couple of days. I like to run it every 15 min, maybe every half an hour. I’m trying to get overall behaviour, not identify queries. If I need later to see what queries incur a particular wait, I can use an extended event session.

For the list of waits to ignore, I use Glenn’s list, the latest version found at http://www.sqlskills.com/blogs/glenn/sql-server-diagnostic-information-queries-detailed-day-14/

I run this no less than a day, preferably a week if I can. 2-3 days is normally what I get.

Part the Second: Analysis script

The analysis script does two things:

  • Get the wait times within an interval
  • Pivot them so that I can easily graph in excel

To see which waits I want to include in the pivot, I look at the 20 waits with the highest increase in the interval monitored (this requires that the server wasn’t restarted during it).

I’m not necessarily going to graph and analyse all of them, but it does help ensure I don’t miss something interesting (like, for example, high LCK_M_Sch_S locks every day between 08:00 and 08:45)

For the purposes of this post, let’s say the ones I’m interested in for a particular analysis are LCK_M_IX, PAGELATCH_EX, LATCH_EX and IO_COMPLETION.

To be clear, those are for this example only. Do Not copy the below code and run without specifying the waits you’re interested in looking at, or the results are going to be less than useless.

The first thing I want to do is add a Row_Number based on the times the wait stats were recorded, so that I can join and take the difference between one interval and the next. In theory it should be possible to do this with times, but the insert doesn’t occur at exactly the same time, to the millisecond, each interval, hence this would require fancy date manipulation. Easier to use a ROW_NUMBER

SELECT  WaitType,
        NumberOfWaits,
        SignalWaitTime,
        ResourceWaitTime,
        SampleTime,
        ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval
FROM    dbo.WaitStats
WHERE   WaitType IN ('LCK_M_IX', ‘PAGELATCH_EX’, 'LATCH_EX', 'IO_COMPLETION');

Next step, turn that into a CTE, join the CTE to itself with an offset and take the difference of the waiting tasks, the signal wait time and the resource wait time.

WITH    RawWaits
          AS (SELECT    WaitType,
                        NumberOfWaits,
                        SignalWaitTime,
                        ResourceWaitTime,
                        SampleTime,
                        ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval
              FROM      dbo.WaitStats
              WHERE     WaitType IN ('LCK_M_IX', ‘PAGELATCH_EX’, 'LATCH_EX', 'IO_COMPLETION')
             )
    SELECT  w1.SampleTime,
            w1.WaitType AS WaitType,
            w2.NumberOfWaits - w1.NumberOfWaits AS NumerOfWaitsInInterval,
            w2.ResourceWaitTime - w1.ResourceWaitTime AS WaitTimeInInterval,
            w2.SignalWaitTime - w1.SignalWaitTime AS SignalWaitTimeInInterval
    FROM    RawWaits w1
            LEFT OUTER JOIN RawWaits w2 ON w2.WaitType = w1.WaitType
                                           AND w2.Interval= w1.Interval + 1;

Last step, pivot the results. This will pivot and show the resource wait. Change the column that’s in the select and the pivot to show the others. It doesn’t matter what aggregation function is used because there’s only one value in each interval, so sum, avg, min and max will all give the same result (just, don’t use count)

WITH    RawWaits
          AS (SELECT    WaitType,
                        NumberOfWaits,
                        SignalWaitTime,
                        ResourceWaitTime,
                        SampleTime,
                        ROW_NUMBER() OVER (PARTITION BY WaitType ORDER BY SampleTime) AS Interval
              FROM      dbo.WaitStats
              WHERE     WaitType IN ('LCK_M_IX', 'PAGELATCH_EX', 'LATCH_EX', 'IO_COMPLETION')
             ),
        WaitIntervals
          AS (SELECT    w1.SampleTime,
                        w1.WaitType AS WaitType,
                        w2.NumberOfWaits - w1.NumberOfWaits AS NumerOfWaitsInInterval,
                        w2.ResourceWaitTime - w1.ResourceWaitTime AS WaitTimeInInterval,
                        w2.SignalWaitTime - w1.SignalWaitTime AS SignalWaitTimeInInterval
              FROM      RawWaits w1
                        LEFT OUTER JOIN RawWaits w2 ON w2.WaitType = w1.WaitType
                                                       AND w2.Interval = w1.Interval + 1
             )
    SELECT  *
    FROM    (SELECT SampleTime, WaitType, WaitTimeInInterval FROM WaitIntervals
            ) p PIVOT ( AVG(WaitTimeInInterval) FOR WaitType IN ([LCK_M_IX], [PAGELATCH_EX], [LATCH_EX], [IO_COMPLETION]) ) AS pvt
    ORDER BY SampleTime;

And there we have a result that can easily be imported into excel (or R) and graphed or analysed further.