It’s about time I picked this series up again.
I’m not going to go into too much detail on joins. There are some very good articles elsewhere on joins. The important thing to notice about joins, in the context of an execution plan, is that there are six logical join operators and three physical join operators. The logical operators are what you ask for in the context of the query, the physical operators are what the optimiser picks to do the join.
The six logical operators are:
- Inner Join
- Outer Join
- Cross Join
- Cross Apply (new in SQL 2005)
- Semi-Join
- Anti Semi-Join
Craig Freedman wrote a long article on the logical join operators – Introduction to Joins
The semi-joins are the exception, in that they cannot be specified in a query. Nonetheless, they are present in disguise. They’re the logical operators for EXISTS, IN, NOT EXISTS and NOT IN. They’re used when matching is required, but not a complete join.
The three physical operators are what the optimiser uses to evaluate the logical join. There are various conditions that affect the physical operator the will be used for a particular join. The three operators are:
- Nested Loop Join
- Merge Join
- Hash join
Nested Loop
The nested loop join works by looping through all the rows of one input and for each row looping through all the rows of the other input, looking for matches. The nested loop join works best when one or both of the input row sets is small. Since the input that is chosen as the second is read as many times as there are rows in the outer, this join can get very expensive as the size of the inputs increases.
For more detail on the nested loop, see Craig Freedman’s post
Merge Join
The merge join works by running through the two inputs, comparing rows and outputting matched rows. Both inputs must be sorted on the joining columns for this join to be possible. Since both inputs are only read once, this is an efficient join for larger row sets. This efficiency may be offset by the sorted requirement. If the join column is not indexed so as to retrieve the data already sorted, then an explicit sort is required.
For more detail on the merge join, see Craig Freedman’s post
Hash Join
The hash join is one of the more expensive join operations, as it requires the creation of a hash table to do the join. That said, it’s the join that’s best for large, unsorted inputs. It is the most memory-intensive of any of the joins
The hash join first reads one of the inputs and hashes the join column and puts the resulting hash and the column values into a hash table built up in memory. Then it reads all the rows in the second input, hashes those and checks the rows in the resulting hash bucket for the joining rows.
For more detail on the hash join, see Craig Freedman’s post
That’s pretty much that on joins. Next up, aggregates.
Pingback: Some interesting articles.. « QuantuMatrix’s Weblog
Is the hash join not similar to the nested loop in that it compares each hashed value to each row in the hash table ? ie does it not have to loop to do this ? Trying to understand the difference besides the fact that one hashes the values whereas the other compares the actual values.
Not quite
The one resultset is used to form a hash table (wikipedia should have a good coverage of how a hash table works), then the second resultset is hashed to figure out which bucket the value belongs in. It’s a single pass over each resultset.
Nested loop is an O(n^2) algorithm, hash search is much lower than that.
Great intro – just what I needed.
would you tell me what is the exclamation mark some times is seen in nested loops inner join pls?
If you hover the mouse over, the tooltip will tell you what the problem is.
Pingback: Merge join Vs Hash join Vs Nested loop join « SqlserverBlogForum
I know this is old, but thanks for posting this, and for all the info you’ve contributed to the community in various forms. So much good stuff. 🙂
Which one is Faster in which conditions ?
Are their performance depend on
1.)volume of data in 2 joining tables
2.)the way rows are sorted in tbl
3.)indexed joining columns
Which of what is faster?
For logical join types, you choose the type that gets you the results you want. For the physical, the query optimiser chooses the type based on a bunch of things, some of which I discussed in this post.