On to aggregates.
I’m not going to dwell too long on the logical operators for aggregates, as they aren’t as interesting as the joins were. Rather I’m going to concentrate on the physical operators involved.
The two physical operators used for aggregates (and a couple of other logical operations) are:
- Stream Aggregate
- Hash Aggregate
Stream Aggregate
The stream aggregate is the faster and more efficient of the aggregate operators. For it to be used, the input rowset must be sorted in order of the grouping columns.
The stream aggregate works by taking the input rowset and aggregating the values rows by row. When the value of the grouping columns changes, the aggregate value and grouping column values are added to the output rowset and calculation begins for the next grouping value. This is possible only because the input rowset is sorted by the grouping columns.
Sometimes, especially for smaller row sets, the optimiser will decide to sort the row set so that it can do a stream aggregate. Sorts are expensive in themselves, so this isn’t often done.
Hash Aggregate
If the input row set is not sorted, then the stream aggregate cannot be used. In this case, the has aggregate is used instead. The hash aggregate works by creating a hash table (much as for a hash join) and placing the rows into the appropriate hash buckets. Once this is done, the query processor goes through the hash table one bucket at a time, checks the values of the grouping columns (because hash values are not guaranteed unique for different input values) and computes the aggregates for each value of the grouping columns.
As can be imagined, this is a lot more expensive than a stream aggregate and it takes large amounts of memory (for large row sets). If there is insufficient available memory, the hash table can spool to TempDB, causing increased IOs. Because of this, it can be advantageous , especially for larger queries, to create indexes to support the aggregate operations, and allow the optimiser to choose a stream aggregate over the more expensive hash aggregate.
That’s pretty much that for aggregates. Next I’m going to look at some miscellaneous operators
Great article, need to ask is a HASH MATCH the SAME as a HASH AGGREGATE? Also sometimes when I do a Sum and Group by neither of the above are used instead it does a scalar compute?
A hash match is one of the algorithms used to do a join, and sometimes used to do a union. Hash aggregate is used to do an aggregate. It’s much the same algorithm, just used in different ways.
See the post on joins for the hash match – https://www.sqlinthewild.co.za/index.php/2007/12/30/execution-plan-operations-joins/