<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SQL in the Wild</title>
	<atom:link href="http://sqlinthewild.co.za/index.php/feed/" rel="self" type="application/rss+xml" />
	<link>http://sqlinthewild.co.za</link>
	<description>A discussion on SQL Server code</description>
	<lastBuildDate>Thu, 11 Mar 2010 14:00:51 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Root of all Evil</title>
		<link>http://sqlinthewild.co.za/index.php/2010/03/11/the-root-of-all-evil/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/03/11/the-root-of-all-evil/#comments</comments>
		<pubDate>Thu, 11 Mar 2010 14:00:51 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[Rants]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=563</guid>
		<description><![CDATA[Or “Shot gun query tuning”
There have been a fair few forums questions in recent months asking for help in removing index scans, loop joins, sorts or other, presumed, slow query operators. There have been just as many asking how to replace a subquery with a join or a join with a subquery or similar aspects [...]]]></description>
			<content:encoded><![CDATA[<p>Or “<em>Shot gun query tuning</em>”</p>
<p>There have been a fair few forums questions in recent months asking for help in removing index scans, loop joins, sorts or other, presumed, slow query operators. There have been just as many asking how to replace a subquery with a join or a join with a subquery or similar aspects of a query usually for performance reasons.</p>
<p>The first question that I have to ask when looking at requests like that is &#8220;Why?&#8221;</p>
<p>Why is removing a particular query operator the goal? Why is changing a where clause predicate the goal? If it’s to make the query faster, has the query been examined and has it been confirmed that query operator or predicate really is the problem?</p>
<p>The title of this post refers to a comment I’ve seen again and again in blogs or articles about front-end development. &#8220;<em>Premature optimisation is the root of all evils.</em>&#8221; It’s true in the database field as well.</p>
<p><span id="more-563"></span></p>
<p>While optimisation is very important in database development, trying to optimise queries without any idea where the problem with the query is, or even if the query is a problem at all is about as effective in fixing a database performance problem as using a shotgun from 100 meters is in killing mosquitoes. If you hit the problem, it’s by shear luck and nothing else.</p>
<p>There&#8217;s two sides to this problem.</p>
<p>The first aspect of this is, during development, spending time on optimising a query (or stored procedure) without any idea whether or not the query is inefficient and no idea whether or not the changes made make any improvement or not.</p>
<p>Firstly this is a waste of time that could be better spent developing other queries. Second it creates an incorrect impression that the queries have been optimised when in fact nothing of the sort has been done.</p>
<p>The second aspect when, with a production database that is performing badly, queries are modified almost at random in an attempt to fix the performance problem quickly.</p>
<p>This almost never works. It wastes time fixing stuff that very likely isn&#8217;t broken in the first place all the while the database performance deteriorates and management curses SQL Server as &#8216;nonscalable&#8217;</p>
<p>So, what is the right approach for the above two scenarios?</p>
<ol>
<li>Don&#8217;t optimise queries without knowing if they need it.</li>
<li>Don&#8217;t optimise queries without knowing if they need it. <sup>1</sup></li>
</ol>
<h3>New development</h3>
<p>When writing queries and stored procedures they need to be tested against a representative data set on a server with representative workload and their performance characteristics evaluated to see if they are acceptable. If the query&#8217;s performance characteristics are acceptable, then that query requires no optimisation<sup>2</sup></p>
<p>This doesn&#8217;t mean write bad code and push it to production. It means write good, solid code, following accepted coding standards, ensure that it runs acceptably against production-volumes of data, and do not spend hours or days trying to get it running a couple of milliseconds faster.</p>
<p>And if the query doesn&#8217;t perform acceptable, identify the problematic portion and fix that, don&#8217;t flail around rewriting bits of the query in the hope that the problem will magically go away.</p>
<p>The execution plan is the primary tool here, along with the output of Statistics IO.</p>
<h3>Fixing existing code</h3>
<p>When evaluating existing databases with know performance problems, limit the performance tuning to queries that really are performing badly and need optimisation. It&#8217;s often true that fixing the top 5-10 worst performing queries will have massive effects in overall system performance, far more than tuning twice that number of queries that aren&#8217;t really a problem.</p>
<p>The best tool for finding which queries really are the worst offenders is SQL Trace.</p>
<p>When looking at queries that are a problem, identify the portions that are inefficient and target attempts at optimisation towards those problems.</p>
<h3>In conclusion</h3>
<p>Measure Twice.<br />
Optimise if necessary.</p>
<hr />
<p>(1) No, that wasn&#8217;t a typo.</p>
<p>(2) At that time. Later changes to schema or data volume may require existing queries to be revised.</p>
<p>For more details on exactly how to identify problematic queries, refer to the <a href="http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-1/">series I wrote at Simple Talk</a> last year.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/03/11/the-root-of-all-evil/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NOT EXISTS vs NOT IN</title>
		<link>http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/#comments</comments>
		<pubDate>Thu, 18 Feb 2010 14:00:32 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=553</guid>
		<description><![CDATA[Continuing with the mini-series on query operators, I want to have a look at NOT EXISTS and NOT IN.
Just one note before diving into that. The examples I’m using are fairly simplistic and that’s intentional. I’m trying to find what, if any, are the performance differences in a benchmark-style setup. I’ll have some comments on [...]]]></description>
			<content:encoded><![CDATA[<p>Continuing with the mini-series on query operators, I want to have a look at NOT EXISTS and NOT IN.</p>
<p>Just one note before diving into that. The examples I’m using are fairly simplistic and that’s intentional. I’m trying to find what, if any, are the performance differences in a benchmark-style setup. I’ll have some comments on more complex examples in a later post.</p>
<p>The most important thing to note about NOT EXISTS and NOT IN is that, unlike EXISTS and IN,  they are not equivalent in all cases. Specifically, when NULLs are involved they will return different results. To be totally specific, when the subquery returns even one null, NOT IN will not match any rows.</p>
<p>The reason for this can be found by looking at the details of what the NOT IN operation actually means.</p>
<p>Let’s say, for illustration purposes that there are 4 rows in the table called t, there’s a column called ID with values 1..4</p>
<pre class="brush: sql;">WHERE SomeValue NOT IN (SELECT AVal FROM t)
is equivalent to
WHERE (
SomeValue != (SELECT AVal FROM t WHERE ID=1)
AND
SomeValue != (SELECT AVal FROM t WHERE ID=2)
AND
SomeValue != (SELECT AVal FROM t WHERE ID=3)
AND
SomeValue != (SELECT AVal FROM t WHERE ID=4)
)</pre>
<p>Let’s further say that AVal is NULL where ID = 4. Hence that != comparison returns UNKNOWN. The logical truth table for AND states that UNKNOWN and TRUE is UNKNOWN, UNKNOWN and FALSE is FALSE. There is no value that can be AND’d with UNKNOWN to produce the result TRUE</p>
<p>Hence, if any row of that subquery returns NULL, the entire NOT IN operator will evaluate to either FALSE or NULL and no records will be returned</p>
<p>So what about EXISTS?</p>
<p><span id="more-553"></span></p>
<p>Exists cannot return NULL. It’s checking solely for the presence or absence of a row in the subquery and, hence, it can only return true or false. Since it cannot return NULL, there’s no possibility of a single NULL resulting in the entire expression evaluating to UNKNOWN.</p>
<p>Hence, when the column in the subquery that’s used for comparison with the outer table can have nulls in it, consider carefully which of Not Exists or Not in you want to use.</p>
<p>Ok, but say there are no nulls in the column. How do they compare speed-wise. I’m going to do two tests, one where the columns involved in the comparison are defined as NULL and one where they are defined as NOT NULL. There will be no null values in the columns in either case. In both cases, the join columns will be indexed. After all, we all index our join columns, right?</p>
<p>So, first test, non-nullable columns. First some setup</p>
<pre class="brush: sql;">CREATE TABLE BigTable (
id INT IDENTITY PRIMARY KEY,
SomeColumn char(4) NOT NULL,
Filler CHAR(100)
)

CREATE TABLE SmallerTable (
id INT IDENTITY PRIMARY KEY,
LookupColumn char(4) NOT NULL,
SomeArbDate Datetime default getdate()
)

INSERT INTO BigTable (SomeColumn)
SELECT top 250000
char(65+FLOOR(RAND(a.column_id *5645 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *3784 + b.object_id)*12)) +
char(65+FLOOR(RAND(b.column_id *6841 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *7544 + b.object_id)*8))
from master.sys.columns a cross join master.sys.columns b

INSERT INTO SmallerTable (LookupColumn)
SELECT DISTINCT SomeColumn
FROM BigTable TABLESAMPLE (25 PERCENT)
-- (3898 row(s) affected)

CREATE INDEX idx_BigTable_SomeColumn
ON BigTable (SomeColumn)
CREATE INDEX idx_SmallerTable_LookupColumn
ON SmallerTable (LookupColumn)</pre>
<p>Then the queries</p>
<pre class="brush: sql;">-- Query 1
SELECT ID, SomeColumn FROM BigTable
WHERE SomeColumn NOT IN (SELECT LookupColumn FROM SmallerTable)

-- Query 2
SELECT ID, SomeColumn FROM BigTable
WHERE NOT EXISTS (SELECT LookupColumn FROM SmallerTable WHERE SmallerTable.LookupColumn = BigTable.SomeColumn)</pre>
<p>The first thing to note is that the execution plans are identical.</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/02/ExecPlansNOTNULL.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="ExecPlansNOTNULL" src="http://sqlinthewild.co.za/wp-content/uploads/2010/02/ExecPlansNOTNULL_thumb.png" border="0" alt="ExecPlansNOTNULL" width="244" height="130" /></a></p>
<p>The execution characteristics are also identical.</p>
<blockquote><p><strong>Query 1<br />
</strong>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 342, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 8, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 156 ms,  elapsed time = 221 ms.</p>
<p><strong>Query 2<br />
</strong>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 342, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 8, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 156 ms,  elapsed time = 247 ms.</p></blockquote>
<p>So, at least for the case where the columns are defined as NOT NULL, these two perform the same.</p>
<p>What about the case where the columns are defined as nullable? I&#8217;m going to simply alter the two columns involved without changing anything else, then test out the two queries again.</p>
<pre class="brush: sql;">ALTER TABLE BigTable
 ALTER COLUMN SomeColumn char(4) NULL

ALTER TABLE SmallerTable
 ALTER COLUMN LookupColumn char(4) NULL</pre>
<p>And the same two queries</p>
<pre class="brush: sql;">-- Query 1
&lt;pre&gt;SELECT ID, SomeColumn FROM BigTable
WHERE SomeColumn NOT IN (SELECT LookupColumn FROM SmallerTable)

-- Query 2
SELECT ID, SomeColumn FROM BigTable
WHERE NOT EXISTS (SELECT LookupColumn FROM SmallerTable WHERE SmallerTable.LookupColumn = BigTable.SomeColumn)</pre>
<p>And as for their performance…</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/02/ExecPlansNull.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="ExecPlansNull" src="http://sqlinthewild.co.za/wp-content/uploads/2010/02/ExecPlansNull_thumb.png" border="0" alt="ExecPlansNull" width="244" height="123" /></a></p>
<blockquote><p><strong>Query 1</strong><br />
Table &#8216;SmallerTable&#8217;. Scan count 3, logical reads 500011, physical reads 0.<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 437, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 827 ms,  elapsed time = 825 ms.</p>
<p><strong>Query 2<br />
</strong>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 437, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 9, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 156 ms,  elapsed time = 228 ms.</p></blockquote>
<p>Radically different execution plans, radically different performance characteristics. The NOT IN took over 5 times longer to execute and did thousands of times more reads.</p>
<p>Why is that complex execution plan required when there may be nulls in the column? I can&#8217;t answer that one, probably only one of the query optimiser developer can, however the results are obvious. When the columns allow nulls but has none, the NOT IN performs significantly worse than NOT EXISTS.</p>
<p>So, take-aways from this?</p>
<p>Most importantly, NOT EXISTS and NOT IN do not have the same behaviour when there are NULLs involved. Chose carefully which you want.</p>
<p>Columns that will never contain NULL values should be defined as NOT NULL so that SQL knows there will never be NULL values in them and so that it doesn’t have to produce complex plans to handle potential nulls.</p>
<p>On non-nullable columns, the behaviour and performance of NOT IN and NOT EXISTS are the same, so use whichever one works better for the specific situation.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>SQL Server Usergroup &#8211; February meeting</title>
		<link>http://sqlinthewild.co.za/index.php/2010/02/10/sql-server-usergroup-february-meeting/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/02/10/sql-server-usergroup-february-meeting/#comments</comments>
		<pubDate>Wed, 10 Feb 2010 07:00:26 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Community]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=547</guid>
		<description><![CDATA[The February meeting of the SA SQL Server usergroup will be on the 16th of February 2010. Venue and time are the same as always, 18h30 at the Microsoft offices
This month, Richard Sweetnam, one of Microsoft SA&#8217;s Premier Field Engineers will be presenting on Tips and Tricks for Management Studio.
Hope to see you all there.
]]></description>
			<content:encoded><![CDATA[<p>The February meeting of the SA SQL Server usergroup will be on the 16th of February 2010. Venue and time are the same as always, 18h30 at the Microsoft offices</p>
<p>This month, Richard Sweetnam, one of Microsoft SA&#8217;s Premier Field Engineers will be presenting on Tips and Tricks for Management Studio.</p>
<p>Hope to see you all there.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/02/10/sql-server-usergroup-february-meeting/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Genetic Algorithms</title>
		<link>http://sqlinthewild.co.za/index.php/2010/01/28/genetic-algorithms/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/01/28/genetic-algorithms/#comments</comments>
		<pubDate>Thu, 28 Jan 2010 14:00:59 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Personal]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=531</guid>
		<description><![CDATA[Warning. This post has absolutely nothing to do with SQL Server.
What are Genetic Algorithms?
Genetic algorithms are a form of evolutionary computation, a branch of artificial intelligence that focuses on evolving effective or optimal solutions to difficult problems, based on the biological theory of evolution.
Genetic algorithms are, at their core, a search/optimisation technique. They are a [...]]]></description>
			<content:encoded><![CDATA[<p>Warning. This post has absolutely nothing to do with SQL Server.</p>
<h3>What are Genetic Algorithms?</h3>
<p>Genetic algorithms are a form of evolutionary computation, a branch of artificial intelligence that focuses on evolving effective or optimal solutions to difficult problems, based on the biological theory of evolution.</p>
<p>Genetic algorithms are, at their core, a search/optimisation technique. They are a way of finding maximum/minimum solutions to problems and, can be effective when there is no algorithmic solution to the problem. An example here would be the ‘Travelling Salesman’ problem.</p>
<p>Genetic algorithms work by taking an initial population of potential solutions (referred to as individuals), selecting a subset of the population that has the highest fitness then using that subset to generate a second generation. From the second generation again a subset with the highest fitness is selected and used to generate a third generation. This repeats until either the &#8216;fittest’ individual is considered a good enough solution, or until a certain number of generations have passed.</p>
<p>There are advantage to using genetic algorithms to solve problems over more traditional methods like <a href="http://en.wikipedia.org/wiki/Hill_climbing">hill climbing</a>.</p>
<ul>
<li>Genetic algorithms can quickly produce good solutions though they may take a lot of time to find the best solution. This is a benefit when the problem is such that the absolute best solution is not necessary, just one that is ‘good enough’</li>
<li>They are not susceptible to getting trapped by local maxima.</li>
<li>They do not work on the entire search space one potential solution at a time, but rather work on populations of potential solutions, focusing towards more optimal areas of the search space.</li>
</ul>
<p>A genetic algorithm will almost always find an optimal solution, given enough time. The main downside is that they may take a lot of time to find that optimal solution.</p>
<p><span id="more-531"></span></p>
<h3>Components of a Genetic Algorithm</h3>
<p>There are two main critical parts in setting up a genetic algorithm for a problem.</p>
<ul>
<li>The encoding of the potential solutions into a form where they can be operated on.</li>
<li>The fitness function which defines which individuals are better than others, which are closer to the maximum that is being searched for.</li>
</ul>
<p>Most of the design work when using genetic algorithms goes into those two problems.</p>
<h4>Encoding</h4>
<p>Encoding is the process of taking all the values that make up a potential solution and turning them into a form that the genetic algorithm can operate on.</p>
<p>The selection of an encoding is of utmost importance to the effectiveness of the entire process and a poor representation can make the entire problem much harder than it should. Unfortunately there has been little academic work done on the process of designing representations.</p>
<p>Often for genetic algorithms, the end result of the encoding will be a binary string. There are other variations of evolutionary computation that use other representations, from the arrays of real numbers used by evolutionary strategies to the code trees used by genetic programming.</p>
<h4>Fitness function</h4>
<p>Depending on the problem, the fitness function can be trivial to write or near-impossible. The design of the fitness function is completely based on the problem that is being solved.</p>
<p>There are two important considerations for a fitness function.</p>
<ul>
<li>It must be deterministic.</li>
<li>It must be fast</li>
</ul>
<p>If the fitness of an individual is assessed twice, it must come to the same value<sup>1</sup>. If the fitness function could return different values for the same individual, then it is of no use in determining the fittest individuals in the population and hence the genetic algorithm will not be able to identify the best solution to the problem.</p>
<p>The fitness of each individual is assessed at least once in each generation. The calculation of the fitness function is usually the most time consuming part of the entire process and the longer the fitness function takes to run, the longer the entire process is run</p>
<p>(1) There are cases where the fitness of an individual may depend on external factors which change over time. Hence a fitness function may give different values for one individual if calculated at different times. Genetic algorithms in a changing environment are a little beyond the scope of this entry.</p>
<h3>Evolution Process</h3>
<p>In order to create a new generation, the fittest individuals from the previous generation are taken and used to generate the next generation. There are two main operators that are used to generate a generation from the previous one. Crossover and mutation.</p>
<h4>Crossover</h4>
<p>Crossover involved taking two individuals, splitting each one’s encoded string and swapping parts to generate two new individuals.</p>
<p>Say we had two individuals with the following encoded strings (spaces added for clarity)<br />
0000 0001 1111 1110<br />
0101 1010 1100 0011<br />
and we chose the splitting point for the crossover after the 4th bit, the resulting strings after the crossover will be<br />
0000 1010 1100 0011<br />
0101 0001 1111 1110</p>
<p>In genetic algorithms crossover is the primary operator used. What I described here was a single crossover. There are a number of other variations that can be used.</p>
<h4>Mutation</h4>
<p>Mutation is an operator applied to a single individual. It’s usually applied after crossover has generated new individuals. Mutation involves flipping a single bit somewhere in the encoded string.</p>
<p>Let’s take the two individuals that were generates by the crossover earlier and apply a random mutation to each<br />
0000 1010 1100 0011<br />
0101 0001 1111 1110<br />
After<br />
0000 1010 1101 0011<br />
0101 0001 1011 1110</p>
<p>In genetic algorithms mutation is very seldom applied and only a small percentage of individuals in a generation will be affected by the mutation operator.</p>
<h3>Example</h3>
<p>As a quick example let’s manually evolve a simple function to see how the whole thing works.</p>
<p>Let’s say I have an array of 4 numbers (call it num) between 0 and 15. I want to know what values give me the best value for the following.</p>
<p>num[1]-num[2]-num[3]+num[4]</p>
<p>I know, that’s simple enough that we could work out the optimal solution just by eye. Not the point. This is enough to do a quick and effective demo with.</p>
<p>I’m going to encode that by simply converting the numbers in the array to binary and concatenating the binary representations of the 4 numbers (spaces just added for clarity). The fitness function is already defined. I’m going to start with an initial population of eight individuals.</p>
<p>1111 1111 1100 1110 – fitness = (15-15-12+14) = 2<br />
0101 1010 1100 0011 – fitness = (9-10-12+3) = –10<br />
1011 0111 0011 1111 – fitness = (11-7-3+15) = 16<br />
1111 1001 1010 0011 – fitness = (15-9-10+3) = -1<br />
1010 1010 1010 1010 – fitness = (10-10-10+10) = 0<br />
1000 0010 0111 0110 – fitness = (8-2-7+6) = 5<br />
0000 0001 1111 1110 – fitness = (0-1-15+14) = -2<br />
1010 0101 0010 0101 – fitness = (10-5-2+5) = 8</p>
<p>From this I’m going to take the 4 individuals with the highest fitness, use crossover operations (with the crossover point exactly in the middle) between them until I have 8 individuals for the 2nd generation and then apply a single bit mutation to one of the individuals (detailed steps left as an exercise for the reader)</p>
<p>1111 1111 0011 1111 – fitness = (15-15-3+15) = 12<br />
1011 0111 1100 1110 – fitness = (11-7-12+14) = 6<br />
1111 1011 0010 0101 – fitness = (15-11-2+5) = 7<br />
1010 0101 1100 1110 – fitness = (10-5-7+6) = 4<br />
1011 0111 0111 0110 – fitness = (11-7-7+6) = 3<br />
1000 0010 0011 1111 – fitness = (8-2-3+15) = 18<br />
1010 0101 0111 0110 – fitness = (10-5-7+6) = 4<br />
1000 0010 0010 0101 – fitness = (8-2-2+5) = 9</p>
<p>We can already see an improvement. The average and maximum fitness is much higher than for the first generation. I’ll do one more generation in this example, again taking the 4 fittest individuals, crossing over to generate 8 new individuals and then applying a single bit mutation to two individuals. This time however, the crossover point will between the 4th and 5th bit.</p>
<p>1000 1111 0011 1111 &#8211; fitness = (8-15-3+15) = 5<br />
1111 0010 0011 1111 &#8211; fitness = (15-2-3+15) = 25<br />
1000 1011 0010 0101 &#8211; fitness = (8-11-2+5) = 0<br />
1111 0010 0011 1111 &#8211; fitness = (15-2-3+15) = 25<br />
1111 0010 1010 0101 &#8211; fitness = (15-2-10+5) = 8<br />
1000 0111 0011 1111 &#8211; fitness = (8-7-3+15) = 13<br />
1111 0010 0010 0101 &#8211; fitness = (15-2-2+5) = 16<br />
1000 1011 0010 0101 &#8211; fitness = (8-11-2+5) = 0</p>
<p>I think that’s enough for this example. We’re getting fairly close to the best possible solution (15,0,0,15), close enough to see how this works. The population size was very low, that’s why there are duplicates appearing in the results of the crossover. With a larger search space there would be a lot more diversity.</p>
<p>I hope that anyone still reading found this brief diversion into the realms of AI interesting. The regular SQL-related posts will return soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/01/28/genetic-algorithms/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Getting here from there</title>
		<link>http://sqlinthewild.co.za/index.php/2010/01/21/getting-here-from-there/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/01/21/getting-here-from-there/#comments</comments>
		<pubDate>Thu, 21 Jan 2010 20:30:23 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Misc]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=535</guid>
		<description><![CDATA[Another month, another blog chain, this time started by Paul Randal. I got tagged by both Grant and Steve, on the same day.
I could easily think of two events that dramatically influenced where I am today, finding a third with as major an impact was difficult. I think the third one qualifies as an important [...]]]></description>
			<content:encoded><![CDATA[<p>Another month, another blog chain, this time started by <a href="http://sqlskills.com/BLOGS/PAUL/post/What-three-events-brought-you-here.aspx">Paul Randal</a>. I got tagged by both <a href="http://scarydba.wordpress.com/">Grant</a> and <a href="http://www.sqlservercentral.com/blogs/steve_jones/archive/2010/01/18/what-three-things-brought-me-here.aspx">Steve</a>, on the same day.</p>
<p>I could easily think of two events that dramatically influenced where I am today, finding a third with as major an impact was difficult. I think the third one qualifies as an important enough event, while it didn&#8217;t really affect my career, it did influence my community involvement.</p>
<h3>I canna take it anymore</h3>
<p>I grew up surrounded by two things, computers and science fiction.</p>
<p>My father was a computer programmer in those days (today to runs a software company) and there were computers around from the earliest I remember. From the <a href="http://www.old-computers.com/MUSEUM/computer.asp?st=1&amp;c=174">Sharp</a> that I played Asteroids and The Valley on, to the <a href="http://www.old-computers.com/MUSEUM/computer.asp?st=1&amp;c=539">NCR </a>with it&#8217;s beeping keyboard where I first started programming (in a variant of basic), to the 80286 that my father gave me when he bought himself something faster. I&#8217;ve always had computers around that I could use. Despite that, I never had any intention of going into IT as a career.</p>
<p>My mother is a trekkie (classic Star Trek only please) so I grew up watching (and reading) lots of Science Fiction. From Star Trek to Dr Who to Battlestar Galactica to the entire science fiction collection at the local library I watched and read everything I could get my hands on, and it wasn&#8217;t long before I started reading Science fact as well as Science fiction. By the time I got to high school my career plans were leaning in the direction of Physics and Astronomy. Placing very high in the national Science Olympiad and almost winning a trip to Space Camp just strengthened those intentions.  I enjoyed playing with computers, but that was more a hobby (and, by that point, a place to play games)</p>
<p>I entered university with the intention to major in Physics, take a related subject as my second major and then get an Honours degree<sup>1</sup> in Physics and find a job in astronomy or physics research. I took Computer Science as my second major because it was one of the few subjects that I was interested in that didn&#8217;t conflict with the other subjects I had to take (Chemistry 1 and Maths 1) I spend most of my spare time in my first two years in the Physics department library. I reckon that I must have read easily a third of that library in those two years</p>
<p>Just two problems with that intention. Firstly, there&#8217;s almost no demand in this country for physicists other than the universities and the national observatory. Secondly, by the time I got to 3rd year physics, I couldn&#8217;t handle the Maths involved. It was part way through the course on Quantum Physics (which contained more maths than some of the 3rd year maths courses did) that I realised that if I couldn&#8217;t handle the maths at this point, there was no way I&#8217;d ever be able to get a post-grad degree in physics.</p>
<p>I finished the Bachelors degree majoring in Physics and Computer Science and then applied for the honours degree in the Computer Science department</p>
<p>(1) In South Africa the Honours degree is a one year post-grad degree that sits between the Bachelors degree and the Masters degree.</p>
<h3><span id="more-535"></span>Don&#8217;t hold back, tell me what you really think.</h3>
<p>Fast forward about five and a half years.</p>
<p>I&#8217;d been doing assorted development work since leaving university. Starting with Oracle Reports, moving into MS Access and Visual Basic and finally into the web development area. I loved web dev. It was complex enough to be a challenge but not so complex that it wasn&#8217;t fun, and the whole thing just made sense to me.</p>
<p>I was working at the home loans division of a major bank doing a mixture of new development, maintenance of existing code and reports. When the main application started having performance problems, the company got a consultant in to fix it, and I got assigned to help out because I knew SQL better than any of the other devs (translate, I could write a select statement across multiple tables). I&#8217;ve spoken before about what I learnt from that.  After that the consultant got involved in other projects and I went on with other work, but there were a lot of problems in the department and so I was looking for somewhere else to work.</p>
<p>I told the consultant that I was planning to resign and I was looking for job in development, preferably web development. His reply was rather blunt, something along the lines of &#8220;Don&#8217;t be stupid, that&#8217;s a waste of your time and talents.&#8221; He then spoke of the type of work that he and his colleague did, enterprise server stuff, Biztalk, SQL, Exchange, etc. It was a surprise just how much of that work there was.</p>
<p>I won&#8217;t say that I immediately took his advice and swore off web dev forever. I didn&#8217;t. It did however get me thinking and reading and realising just how much there was to that kind of work, how much there was to IT that wasn&#8217;t front-end development and wasn&#8217;t system administration.</p>
<p>When I took another job with a different bank a couple months later, one of the conditions that I asked for was to have to option of moving from the DB developer role back into web development if I asked. They agreed. I never took the option up.</p>
<h3>Can I ask your opinion?</h3>
<p>Fast forward another couple of years. It was 2006 and I was attending my very first TechEd South Africa. By this point I was doing performance tuning full time at the company I worked at, and I knew a fair bit about SQL Server and I posted on the forums occasionally. I&#8217;d attended the PASS European Summit the year before and the competency and knowledge of some of the speakers there had stunned me.</p>
<p>Microsoft had managed to get a few really good international speakers for that TechEd, including one who presented on SQL Server (who, incidentally had also presented at PASS Europe the year before). After the conference was over, just before the closing keynote, I saw the SQL Server speaker in the passage heading, like I was, for the keynote. So I went over. said &#8216;hi&#8217; and all that, and asked one question (well, two if you count &#8216;Can I help you with your bags?&#8217;)</p>
<p>&#8220;How do I get to where you are?&#8221;</p>
<p>The answer took the entire walk to the keynote and some more time. None of the advice was implemented immediately. Some (this blog) took almost a year, some I never implemented, but it got me thinking and it got me contributing to the community, not just been a passive recipient of information.</p>
<p>There&#8217;s other things I could talk about, like the six months of writing Oracle reports and what that taught me about writing select statements or the invitation to join a roleplaying group that helped get over my problems with speaking in public, but I think this is about enough.</p>
<p>Everyone I know seems already tagged, so we&#8217;ll just leave it there&#8230;.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/01/21/getting-here-from-there/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>SQL Pass session evaluations</title>
		<link>http://sqlinthewild.co.za/index.php/2010/01/13/sql-pass-session-evaluations/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/01/13/sql-pass-session-evaluations/#comments</comments>
		<pubDate>Wed, 13 Jan 2010 14:43:20 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Pass 2009]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/index.php/2010/01/13/sql-pass-session-evaluations/</guid>
		<description><![CDATA[I finally got the last of my PASS Summit session evals and so, like some other people, I thought I’d make them public.
Lies, damned lies and statistics (DBA-388-S)
This session went very well. I was comfortable with the material, it’s a topic I really like and in general it felt, to me at least, like a [...]]]></description>
			<content:encoded><![CDATA[<p>I finally got the last of my PASS Summit session evals and so, like <a href="http://scarydba.wordpress.com/2010/01/06/pass-summit-evaluations/">some</a> <a href="http://sqlblog.com/blogs/louis_davidson/archive/2010/01/08/the-reviews-are-in-and-i-still-have-things-to-improve.aspx">other</a> <a href="http://sqlblog.com/blogs/andy_leonard/archive/2010/01/07/pass-summit-evaluations.aspx">people</a>, I thought I’d make them public.</p>
<h3>Lies, damned lies and statistics (DBA-388-S)</h3>
<p>This session went very well. I was comfortable with the material, it’s a topic I really like and in general it felt, to me at least, like a good session. The ratings seem to agree with that.</p>
<table border="1" cellspacing="0" cellpadding="2" width="495">
<tbody>
<tr>
<td width="157" valign="top"></td>
<td width="65" align="center">Very Poor</td>
<td width="65" align="center">Poor</td>
<td width="65" align="center">Average</td>
<td width="65" align="center">Good</td>
<td width="65" align="center">Excellent</td>
</tr>
<tr>
<td width="157" valign="top">How would you rate the usefulness of the session information in your day-to-day environment?</td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" align="center">1</td>
<td width="65" align="center">7</td>
<td width="65" align="center">36</td>
</tr>
<tr>
<td width="157" valign="top">How would you rate the Speaker&#8217;s presentation skills?</td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" align="center">3</td>
<td width="65" align="center">5</td>
<td width="65" align="center">36</td>
</tr>
<tr>
<td width="157" valign="top">How would you rate the Speaker&#8217;s knowledge of the subject?</td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" align="center">4</td>
<td width="65" align="center">40</td>
</tr>
<tr>
<td width="157" valign="top">How would you rate the accuracy of the session title, description, and experience level to the actual session?</td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" align="center">5</td>
<td width="65" align="center">39</td>
</tr>
<tr>
<td width="157" valign="top">How would you rate the amount of time allocated to cover the topic/session?</td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" align="center">11</td>
<td width="65" align="center">33</td>
</tr>
<tr>
<td width="157" valign="top">How would you rate the quality of the presentation materials?</td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" align="center">1</td>
<td width="65" align="center">7</td>
<td width="65" align="center">36</td>
</tr>
</tbody>
</table>
<p>If I make Very Poor = 1 and Excellent = 5 then, averaging all the scores over all the questions, overall that session rated at 4.82/5</p>
<p>Not bad at all.</p>
<p>Edit: The overall <a href="http://www.sqlpass.org/Events/BestOfSummit.aspx">PASS Summit session ratings</a> are out and this session came in at 7th overall (all sessions including pre/post cons, all tracks) and 5th in the DBA track, behind only <a href="http://blogs.msdn.com/buckwoody/">Buck Woody</a>, <a href="http://sqlskills.com/blogs/kimberly/">Kimberly Tripp</a> and <a href="http://sqlskills.com/blogs/paul/">Paul Randal</a> I am extremely surprised to have come in that high at a conference like the PASS Summit.</p>
<h3>Insight into Indexes (DBA-315)</h3>
<p>This session was a whole different story. It did not go well at all, and I didn’t need the ratings to tell me that.</p>
<p>I wasn’t overly comfortable with the material. This is not to say that I didn’t know it, I did, but I wasn’t comfortable with it. In retrospect, I should have scrapped the entire presentation and done it over from scratch in a different way, even if that meant doing it the night before. Lesson learnt there.</p>
<p>To add to that, I broke my own rules for presentations. Usually I’m at the session room at least 5 minutes before the previous session finishes, with my laptop booted, the presentation loaded, management studio (and profiler if necessary) open and any pre-demo scripts already run. That way, as soon as the speaker who’s presenting in the session before mine finishes, I can get on stage, plug the laptop in, get the projector online and then relax.</p>
<p>In this case, I was late. The previous speaker had already left and my laptop was still switched off. Hence I rushed to get everything loaded and ready, and Windows, sensing the urgency, promptly crashed hard.</p>
<p>Cue 2 minutes of frantically trying to reboot laptop (it was ignoring all shut down requests) and load presentation onto the desktop in case my laptop didn’t reboot. All while the AV guy’s trying to get the audio on and the recording started.</p>
<p>Let’s just say it went downhill from there.</p>
<p>So, ratings for that one.</p>
<table border="1" cellspacing="0" cellpadding="2" width="495">
<tbody>
<tr>
<td width="157" valign="top"></td>
<td width="65" align="center">Very poor</td>
<td width="65" align="center">Poor</td>
<td width="65" align="center">Average</td>
<td width="65" align="center">Good</td>
<td width="65" align="center">Excellent</td>
</tr>
<tr>
<td width="157" valign="top">How would you rate the usefulness of the session information in your day-to-day environment?</td>
<td width="65" align="center">2</td>
<td width="65" align="center">1</td>
<td width="65" align="center">7</td>
<td width="65" align="center">23</td>
<td width="65" align="center">51</td>
</tr>
<tr>
<td width="157" valign="top">How would you rate the Speaker&#8217;s presentation skills?</td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" align="center">5</td>
<td width="65" align="center">29</td>
<td width="65" align="center">50</td>
</tr>
<tr>
<td width="157" valign="top">How would you rate the Speaker&#8217;s knowledge of the subject?</td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" align="center">1</td>
<td width="65" align="center">11</td>
<td width="65" align="center">72</td>
</tr>
<tr>
<td width="157" valign="top">How would you rate the accuracy of the session title, description, and experience level to the actual session?</td>
<td width="65" valign="top"></td>
<td width="65" align="center">1</td>
<td width="65" align="center">4</td>
<td width="65" align="center">31</td>
<td width="65" align="center">48</td>
</tr>
<tr>
<td width="157" valign="top">How would you rate the amount of time allocated to cover the topic/session?</td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" align="center">6</td>
<td width="65" align="center">31</td>
<td width="65" align="center">47</td>
</tr>
<tr>
<td width="157" valign="top">How would you rate the quality of the presentation materials?</td>
<td width="65" valign="top"></td>
<td width="65" valign="top"></td>
<td width="65" align="center">4</td>
<td width="65" align="center">33</td>
<td width="65" align="center">47</td>
</tr>
</tbody>
</table>
<p>If I do the same averaging as for the first one, that comes out at 4.55. Not the worst I’ve ever had, though not by much. Lessons learnt.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/01/13/sql-pass-session-evaluations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IN vs INNER JOIN</title>
		<link>http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 14:00:58 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=394</guid>
		<description><![CDATA[Often in forum threads discussing query performance I&#8217;ll see people recommending replacing an INNER JOIN with an IN (or recommending replacing an IN with an INNER JOIN) for performance reasons. Hence it seems to be a good idea to investigate and see what the performance differences (if any) really are.
One very important thing to note [...]]]></description>
			<content:encoded><![CDATA[<p>Often in forum threads discussing query performance I&#8217;ll see people recommending replacing an INNER JOIN with an IN (or recommending replacing an IN with an INNER JOIN) for performance reasons. Hence it seems to be a good idea to investigate and see what the performance differences (if any) really are.</p>
<p>One very important thing to note right off is that they are not equivalent in all cases.</p>
<p>An inner join between two tables does a complete join, it checks for matches and returns rows. This means, if there are multiple matching rows in the second table, multiple rows will be returned. Also, when two tables are joined, columns can be returned from either.  As a quick example (definition of BigTable towards the end of the post)</p>
<pre class="brush: sql;">DECLARE @SomeTable (IntCol int)
Insert into @SomeTable (IntCol) Values (1)
Insert into @SomeTable (IntCol) Values (2)
Insert into @SomeTable (IntCol) Values (2)
Insert into @SomeTable (IntCol) Values (3)
Insert into @SomeTable (IntCol) Values (4)
Insert into @SomeTable (IntCol) Values (5)
Insert into @SomeTable (IntCol) Values (5)

SELECT *
 FROM BigTable b INNER JOIN @SomeTable  s ON b.SomeColumn IN s.IntCol</pre>
<p>This returns 7 rows and returns columns from both tables. Because the values in @SomeTable are duplicated, the matching rows from BigTable are returned twice.</p>
<p>With an IN, what is done is a semi-join, a join that checks for matches but does not return rows. This means if there are multiple matching tables in the resultset used for the IN, it doesn&#8217;t matter. Only one row from the first table will be returned. Also, because the rows are not returned, columns from the table referenced in the IN cannot be returned. As a quick example</p>
<pre class="brush: sql;">DECLARE @SomeTable (IntCol int)
 Insert into @SomeTable (IntCol) Values (1)
 Insert into @SomeTable (IntCol) Values (2)
 Insert into @SomeTable (IntCol) Values (2)
 Insert into @SomeTable (IntCol) Values (3)
Insert into @SomeTable (IntCol) Values (4)
Insert into @SomeTable (IntCol) Values (5)
 Insert into @SomeTable (IntCol) Values (5)

SELECT *
 FROM BigTable
 WHERE SomeColumn IN (Select IntCol FROM @SomeTable)</pre>
<p>This returns 5 rows and only columns from BigTable.</p>
<p>So, that said, how does the performance of the two differ for the cases where the results are identical (no duplicates in the second table, no columns needed from the second table)? For that, I&#8217;m going to need larger tables to play with.<span id="more-394"></span></p>
<pre class="brush: sql;">
DROP TABLE dbo.BigTable
DROP TABLE dbo.SmallerTable

CREATE TABLE BigTable (
id INT IDENTITY PRIMARY KEY,
SomeColumn CHAR(4),
Filler CHAR(100)
)

CREATE TABLE SmallerTable (
id INT IDENTITY PRIMARY KEY,
LookupColumn CHAR(4),
SomeArbDate DATETIME DEFAULT GETDATE()
)

INSERT INTO BigTable (SomeColumn)
SELECT top 250000 char(65+FLOOR(RAND(a.column_id *5645 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *3784 + b.object_id)*12)) + char(65+FLOOR(RAND(b.column_id *6841 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *7544 + b.object_id)*8))
FROM master.sys.columns a CROSS JOIN master.sys.columns b

INSERT INTO SmallerTable (LookupColumn)
SELECT DISTINCT SomeColumn
FROM BigTable TABLESAMPLE (25 PERCENT)
-- (3819 row(s) affected)
</pre>
<p>That&#8217;s the setup done, now for the two test cases. Let’s first try without indexes and see how the INNER JOIN and IN compare. I&#8217;m selecting from just the first table to ensure that the two queries are logically identical. The DISTINCT used to populate the smaller table ensures that there are no duplicate rows in the smaller table.</p>
<pre class="brush: sql;">SELECT BigTable.ID, SomeColumn
FROM BigTable
WHERE SomeColumn IN (SELECT LookupColumn FROM dbo.SmallerTable)

SELECT BigTable.ID, SomeColumn
FROM BigTable
INNER JOIN SmallerTable ON dbo.BigTable.SomeColumn = dbo.SmallerTable.LookupColumn</pre>
<p>Something of interest straight away, the execution plans are almost identical. Not completely identical, but the only difference is that the hash join for the IN shows a Hash Match (Right Semi Join) and the hash join for the INNER JOIN shows a Hash Match (Inner Join)</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/01/InVsSelect-1.png"><img class="alignnone size-thumbnail wp-image-513" style="border: 1px solid black;" title="In Vs Select 1" src="http://sqlinthewild.co.za/wp-content/uploads/2010/01/InVsSelect-1-150x150.png" alt="In Vs Select 1" width="150" height="150" /></a></p>
<p>The IOs are the same and the durations are extremely similar. Here&#8217;s the IO results and durations for five tests.</p>
<p>IN</p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 14, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 156 ms,  elapsed time = 2502 ms.<br />
CPU time = 157 ms,  elapsed time = 2323 ms.<br />
CPU time = 156 ms,  elapsed time = 2555 ms.<br />
CPU time = 188 ms,  elapsed time = 2381 ms.<br />
CPU time = 203 ms,  elapsed time = 2312 ms.</p></blockquote>
<p>INNER JOIN</p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 14, physical reads 0.</p>
<p>SQL Server Execution Times:<br />
CPU time = 125 ms,  elapsed time = 2922 ms.<br />
CPU time = 140 ms,  elapsed time = 2372 ms.<br />
CPU time = 188 ms,  elapsed time = 2530 ms.<br />
CPU time = 203 ms,  elapsed time = 2323 ms.<br />
CPU time = 187 ms,  elapsed time = 2512 ms.</p></blockquote>
<p>Now let&#8217;s try with some indexes on the join columns.</p>
<pre class="brush: sql;">CREATE INDEX idx_BigTable_SomeColumn ON BigTable (SomeColumn)
CREATE INDEX idx_SmallerTable_LookupColumn ON SmallerTable (LookupColumn)</pre>
<p>Now when I run the two queries, the execution plans are different, and the costs of the two are no longer 50%. Both do a single index scan on each table, but the IN has a Merge Join (Inner Join) and the INNER JOIN has a Hash Match (Inner Join)</p>
<p><a href="http://sqlinthewild.co.za/wp-content/uploads/2010/01/InVsSelect-2.png"><img class="alignnone size-thumbnail wp-image-514" style="border: 1px solid black;" title="InVsSelect 2" src="http://sqlinthewild.co.za/wp-content/uploads/2010/01/InVsSelect-2-150x150.png" alt="InVsSelect 2" width="150" height="150" /></a></p>
<p>The IOs are still identical, other than the WorkTable that only appears for the Hash Join.</p>
<p>IN</p>
<blockquote><p>Table &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 14, physical reads 0.</p></blockquote>
<p>INNER JOIN</p>
<blockquote><p>Table &#8216;Worktable&#8217;. Scan count 0, logical reads 0, physical reads 0.<br />
Table &#8216;BigTable&#8217;. Scan count 1, logical reads 3639, physical reads 0.<br />
Table &#8216;SmallerTable&#8217;. Scan count 1, logical reads 14, physical reads 0.</p></blockquote>
<p>So what about the durations? Honestly it&#8217;s hard to say anything completely conclusive, the durations of both queries are quite small and they are very close. To see if there is any measurable different, I&#8217;m going to run each one 100 times, use Profiler to log the duration and CPU and then average the results over the 100 executions. While running this, I&#8217;m also going to close/disable everything else I can on the computer, to try and get reasonably accurate times.</p>
<p>IN</p>
<p>Average CPU: 130.<br />
Avg duration: 2.78 seconds</p>
<p>INNER JOIN</p>
<p>Average CPU: 161.<br />
Avg duration: 2.93 seconds</p>
<p>Now is that enough to be significant? I&#8217;m not sure. However, looking at those results along with the IO and execution plans, I do have a recommendation for In vs Inner Join</p>
<p>If all you need is to check for matching rows in the other table but don&#8217;t need any columns from that table, use IN. If you do need columns from the second table, use Inner Join.</p>
<p>I still intend to go over NOT IN and NOT EXISTS and, after this one, I also want to take a look at the LEFT JOIN with IS NULL check vs NOT IN for when you want rows from Table1 that don&#8217;t have a match in Table 2.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Look back on 2009 and plans for the new year</title>
		<link>http://sqlinthewild.co.za/index.php/2010/01/01/look-back-on-2009-and-plans-for-the-new-year/</link>
		<comments>http://sqlinthewild.co.za/index.php/2010/01/01/look-back-on-2009-and-plans-for-the-new-year/#comments</comments>
		<pubDate>Thu, 31 Dec 2009 22:00:32 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=502</guid>
		<description><![CDATA[Another year gone and a new one just starting. Time to take a look back at the the goals that I set for myself, which ones I&#8217;ve achieved and which I haven&#8217;t.
I did fairly well with the goals that I set back in October. The experiment design for my thesis is not documented and the [...]]]></description>
			<content:encoded><![CDATA[<p>Another year gone and a new one just starting. Time to take a look back at the the goals that I set for myself, which ones I&#8217;ve achieved and which I haven&#8217;t.</p>
<p>I did fairly well with the goals that I set <a href="http://sqlinthewild.co.za/index.php/2009/10/04/review-and-goals-for-the-rest-of-the-year/">back in October</a>. The experiment design for my thesis is not documented and the WPF book is not finished, but the rest is all done and the exam was easily passed. So not too bad there. Better than I managed in the first half of the year.</p>
<p>And now for next year&#8230;</p>
<p>The biggest thing is that I&#8217;m going to take a near-complete break from the SQL community for at least the first half of the year so that I can focus on my thesis. No writing articles, no contributing to books, no giving or writing presentations. I&#8217;ll still blog, though likely only once a month on SQL-related stuff, though there may well be some AI stuff appearing from time to time. I&#8217;ll still be around on the SSC forums, though not as much as I currently am. I&#8217;ll also still be involved in the local usergroup. I can&#8217;t abandon that.</p>
<p>So, with that out of the way, the goals for the next six months:</p>
<ul>
<li>Read one SQL book</li>
<li>Read at least two AI books</li>
<li>Get the experiment for my thesis designed and coded.</li>
<li>Write at least two chapters of the thesis</li>
<li>Get back into computer graphics and get two images done</li>
</ul>
<p>To add to that, strange as it may seem, I&#8217;m going to ensure that I take time to read, relax and exercise away from the computer. This year I&#8217;ve spend too much time &#8216;busy&#8217;. I have a stack of books almost a metre high that I&#8217;ve bought but not opened. I have a similar stack of movies and games (though not quite as high) and I had a nasty bout of burnout Oct/Nov and I really don&#8217;t want  that again.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2010/01/01/look-back-on-2009-and-plans-for-the-new-year/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Greatest weakness</title>
		<link>http://sqlinthewild.co.za/index.php/2009/12/18/greatest-weakness/</link>
		<comments>http://sqlinthewild.co.za/index.php/2009/12/18/greatest-weakness/#comments</comments>
		<pubDate>Fri, 18 Dec 2009 15:36:44 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Misc]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=500</guid>
		<description><![CDATA[Someone went off and started asking people what their greatest weaknesses are, then someone else decided to pass the question my way. Perhaps those someones’ weaknesses are curiosity&#8230; Still, since everyone else is revealing their darkest secrets, I’ll give it a go as well.
Mine would have to be two very closely related things. Procrastination and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.made2mentor.com/2009/12/%e2%80%9cwhat-is-your-biggest-weakness%e2%80%9d-the-classic-interview-question/">Someone</a> went off and started asking people what their greatest weaknesses are, then <a href="http://scarydba.wordpress.com/2009/12/15/what-is-your-greatest-weakness/">someone else</a> decided to pass the question my way. Perhaps those someones’ weaknesses are curiosity&#8230; Still, since everyone else is revealing their darkest secrets, I’ll give it a go as well.</p>
<p>Mine would have to be two very closely related things. Procrastination and short attention span. (why do you think this post is 4 days late?)</p>
<p>I tend to put stuff off until the absolute last minute (and sometimes beyond then) and then work frantically to get it finished. And because I’m also a bit of a perfectionist, I’m not satisfied with half-done work or a hack job so I&#8217;ll be working late into the night (or over a weekend) to get it done properly.</p>
<p>To add to that, unless I’m doing something I really enjoy, my attention span tends not to be very long. 15-20 minutes is good, it’s usually less. After that it’s either fight to stay focused or take a short break. In and of itself, that’s not so bad, the problem is that the breaks are often not so short, if I get caught up in whatever else I’m doing, or distracted by something else (and something else and something else).</p>
<p>So what am I doing about this?</p>
<p><span id="more-500"></span></p>
<p>Well, firstly, I have no solutions. There are things that I’m doing that are helping, but I won’t say they are general solutions.</p>
<h2>Procrastination</h2>
<p>I’m splitting big tasks up into small (very small) chunks and setting deadlines for those chunks. Sometimes more than one a day. That way, everything should be done at the last minute, so I can’t put things off, much.</p>
<p>This works fairly well if I have a lot of things to do. It doesn’t tend to work well for me when I’ve got relatively few things that need doing and lots of time to do them in. Still, it is helping, as long as I take the time to break work down and list small tasks. I do need to apply a bit more discipline here to make sure that I do break work down and I do adhere to the tasks deadlines.</p>
<h2>Attention span</h2>
<p>This one’s harder. While I can push through when I start losing focus, it’s difficult and gets more difficult as time goes on.</p>
<p>Because I know that I can’t focus on one thing straight through for hours, I’ll usually have two thing that I’m busy with that I can switch between. For example I’ll have an SSIS package that I’m developing and a profile trace that I’m analysing and I can switch between them.</p>
<p>What I have to be careful about is time-consuming non-work related distractions. Which, seeing as I work from home with a big shelf of books behind me and an xbox in the lounge can be difficult. But that comes down to discipline. No games during the day. No more than an hour of gaming over supper when there’s work that needs doing.</p>
<p>This is also why, even though I’ve joined the Twitter madness, the Twitter client is closed for most of the day. It’s also why I’ve started closing Outlook for most of the day. If the clients want me urgently, they’ll phone. Mail can always wait a couple of hours.</p>
<p>Well, I think that’s enough about me. I’m curious to see what <a href="http://www.sqlservercentral.com/blogs/steve_jones/default.aspx">Steve Jones</a> and <a href="http://www.sqlskills.com/BLOGS/PAUL/">Paul Randal</a> have to say (if anything)</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2009/12/18/greatest-weakness/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Are trivial plans cached?</title>
		<link>http://sqlinthewild.co.za/index.php/2009/12/08/are-trivial-plans-cached/</link>
		<comments>http://sqlinthewild.co.za/index.php/2009/12/08/are-trivial-plans-cached/#comments</comments>
		<pubDate>Tue, 08 Dec 2009 15:00:28 +0000</pubDate>
		<dc:creator>Gail</dc:creator>
				<category><![CDATA[Execution Plans]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Syndication]]></category>

		<guid isPermaLink="false">http://sqlinthewild.co.za/?p=495</guid>
		<description><![CDATA[It is sometimes said that trivial execution plans are not cached and queries that have such plans are compiled on every execution. So is that true? To effectively answer this question, we must first establish what a trivial plan is.
A trivial plan is essentially a plan for a query where a specific plan will always [...]]]></description>
			<content:encoded><![CDATA[<p>It is sometimes said that trivial execution plans are not cached and queries that have such plans are compiled on every execution. So is that true? To effectively answer this question, we must first establish what a trivial plan is.</p>
<p>A trivial plan is essentially a plan for a query where a specific plan will always be the most optimal way of executing it. If we consider something like SELECT * FROM SomeTable then there&#8217;s only one real way to execute it, a scan of the cluster/heap.</p>
<p>The trivial plan is somewhat of a query optimiser optimisation. If the query qualifies for a trivial plan (and there are lots of restrictions) then the full optimisation process doesn&#8217;t need to be started and so the query&#8217;s execution plan can be generated quicker and with less overhead. The fact that a query has a trivial plan at one point doesn’t necessarily mean that it will always have a trivial plan, indexes may be added that make the selection of plan less of a sure thing and so the query must go for full optimisation, rather than getting a trivial plan</p>
<p>Nice theory, but how does one tell if a particular query has a trivial execution plan? The information is found within the execution plan, the properties of the highest-level operator has an entry &#8216;Optimisation level&#8217; For a trivial plan this will read ‘TRIVIAL’</p>
<p><img style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" title="Trivial plan" src="http://sqlinthewild.co.za/wp-content/uploads/2009/12/Trivialplan.png" border="0" alt="Trivial plan" width="270" height="362" /></p>
<p><span id="more-495"></span>It&#8217;s also, for the brave people who like XML, found within the xml form of the plan</p>
<pre class="brush: xml;">&amp;amp;lt;StmtSimple StatementCompId=&amp;amp;quot;1&amp;amp;quot; StatementEstRows=&amp;amp;quot;11&amp;amp;quot; StatementId=&amp;amp;quot;1&amp;amp;quot; StatementOptmLevel=&amp;amp;quot;TRIVIAL&amp;amp;quot; StatementSubTreeCost=&amp;amp;quot;0.0032941&amp;amp;quot; StatementText=&amp;amp;quot;SELECT * FROM forums&amp;amp;quot; StatementType=&amp;amp;quot;SELECT&amp;amp;quot; QueryHash=&amp;amp;quot;0xB38EBF594006422E&amp;amp;quot; QueryPlanHash=&amp;amp;quot;0xCC9AB99E7081C81D&amp;amp;quot;&amp;amp;gt;    &lt;br /&gt;&amp;amp;lt;!-- most of the rest of the plan here --&amp;amp;gt;     &lt;br /&gt;&amp;amp;lt;/StmtSimple&amp;amp;gt;</pre>
<p>So, are they cached? The way to find that out is to run a variety of queries and see what&#8217;s sitting in the cache afterwards.</p>
<p>I&#8217;m going to clear the procedure cache then run a couple queries against the AdventureWorks database, checking the graphical execution plan for each one. After they&#8217;ve all been run, I&#8217;ll query the plan cache and check the optimisation level of the cached plans,</p>
<p>Query 1:</p>
<pre class="brush: sql;">SELECT FirstName, LastName
    FROM Person.Person
    WHERE BusinessEntityID = 42</pre>
<p>According to the graphical plan, this query has a trivial plan.</p>
<p>Query 2:</p>
<pre class="brush: sql;">SELECT TOP (10) Name FROM Production.Product</pre>
<p>According to the graphical plan, this query also has a trivial plan.</p>
<p>Query 3:</p>
<pre class="brush: sql;">SELECT * FROM Sales.SalesOrderHeader sh
    INNER JOIN sales.SalesOrderDetail sd
        ON sh.SalesOrderID = sd.SalesOrderID
    WHERE sh.ShipDate &gt; '2008/05/25'</pre>
<p>According to the graphical plan, this query has a non-trivial plan, the optimisation level is listed as full.</p>
<pre class="brush: sql;">SELECT st.text, qp.query_plan,
     qp.query_plan.value('
       declare default element
       namespace &quot;http://schemas.microsoft.com/sqlserver/2004/07/showplan&quot;;
       (//StmtSimple/@StatementOptmLevel)[1]','varchar(20)') AS OptimisationLevel
    FROM sys.dm_exec_cached_plans cp
        CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
        CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st
    WHERE text not like '%sys.dm_exec_cached_plans%'</pre>
<p>This returns 4 rows. All three of the queries that I ran against AdventureWorks, two with optimisation levels of Trivial, one with an optimisation level of Full, and the unparameterised ‘shell’ of the first of the queries.</p>
<p>So it appears that some trivial plans are indeed cached.</p>
]]></content:encoded>
			<wfw:commentRss>http://sqlinthewild.co.za/index.php/2009/12/08/are-trivial-plans-cached/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
