<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BI Monkey &#187; T-SQL</title>
	<atom:link href="http://www.bimonkey.com/tag/t-sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.bimonkey.com</link>
	<description>James Beresford on Microsoft BI and Consulting in Sydney, Australia</description>
	<lastBuildDate>Mon, 23 Jan 2012 22:01:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>Ranking and Numbering rows &#8211; and subsets of rows &#8211; in T-SQL</title>
		<link>http://www.bimonkey.com/2011/05/ranking-and-numbering-rows-and-subsets-of-rows-in-t-sql/</link>
		<comments>http://www.bimonkey.com/2011/05/ranking-and-numbering-rows-and-subsets-of-rows-in-t-sql/#comments</comments>
		<pubDate>Thu, 19 May 2011 11:25:48 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Pivot]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=901</guid>
		<description><![CDATA[I recently had to deal with a scenario where I needed to pivot out some rows after ordering (ranking) them according to specific rules so I could present some rows of data as columns, but in a specific order (don&#8217;t ask why, it&#8217;ll make me grind my teeth about data analysts that don&#8217;t understand how [...]]]></description>
			<content:encoded><![CDATA[<p>I recently had to deal with a scenario where I needed to pivot out some rows after ordering (ranking) them according to specific rules so I could present some rows of data as columns, but in a specific order (don&#8217;t ask why, it&#8217;ll make me grind my teeth about data analysts that don&#8217;t understand how to analyse data&#8230;). The ordering in itself was only part of the solution, as to Pivot the data, the keys need to be specified in the query, so the natural keys can&#8217;t be used. The scenario is set out below:</p>
<div class="wp-caption alignnone" style="width: 639px"><img class=" " title="Fig 1: Rank and Pivot. The Rank column needed to be added" src="http://www.bimonkey.com/uploads/sql/ROW_NUMBER_1.jpg" alt="Fig 1: Rank and Pivot. The Rank column needed to be added" width="629" height="146" /><p class="wp-caption-text">Fig 1: Rank and Pivot. The Rank column needed to be added</p></div>
<p>My first thought was that I&#8217;d have to solve this with a cursor, which wasn&#8217;t a practical option as there were 1.5m rows of data to process, and if my solution involves a cursor I instantly think it&#8217;s a lousy solution. However I was pleased to discover the T-SQL function <strong>ROW_NUMBER() </strong>which allows you to add row numbering to ordered data and even subgroups of that data. (The below samples use the AdventureWorks2008 database.)</p>
<p>First up, basic row numbering:</p>
<blockquote><p><span style="color: #0000ff;">SELECT</span> <span style="color: #ff00ff;"> ROW_NUMBER</span>() <span style="color: #0000ff;">OVER</span> (<span style="color: #0000ff;">ORDER BY</span> ProductId) <span style="color: #0000ff;">AS</span> ID_Key<br />
,        [ProductID]<br />
,        [LocationID]<br />
,        [Shelf]<br />
,        [Bin]<br />
,        [Quantity]</p>
<p><span style="color: #0000ff;">FROM</span> [Production].[ProductInventory]</p>
<p><span style="color: #0000ff;">WHERE</span> [ProductID] <span style="color: #808080;">IN</span> (1,2,3,4)</p></blockquote>
<p>The above query adds an ID key to the data based on ordering by the ProductID field. The <strong>ROW_NUMBER()</strong> function requires an <strong>OVER</strong> clause to know on what basis it should assign the key, and this has to be an ORDER BY statement. The end result looks like this:</p>
<div class="wp-caption alignnone" style="width: 356px"><img title="Fig 2: Simple row numbering" src="http://www.bimonkey.com/uploads/sql/ROW_NUMBER_2.jpg" alt="Fig 2: Simple row numbering" width="346" height="285" /><p class="wp-caption-text">Fig 2: Simple row numbering</p></div>
<p>You can extend this to order within a subgroup, by specifying a <strong>PARTITION BY</strong> clause so ROW_NUMBER() operates with that subgroup. In the example below I partition by ProductId:</p>
<blockquote><p><span style="color: #0000ff;"> SELECT</span> <span style="color: #ff00ff;">ROW_NUMBER</span>() <span style="color: #0000ff;">OVER</span> (<span style="color: #0000ff;">PARTITION BY</span> ProductId <span style="color: #0000ff;">ORDER BY</span> Quantity<span style="color: #0000ff;"> DESC</span>) <span style="color: #0000ff;">AS</span> Subset_ID_Key<br />
,        [ProductID]<br />
,        [LocationID]<br />
,        [Shelf]<br />
,        [Bin]<br />
,        [Quantity]</p>
<p><span style="color: #0000ff;">FROM</span> [Production].[ProductInventory]</p>
<p><span style="color: #0000ff;">WHERE</span> [ProductID] <span style="color: #808080;">IN</span> (1,2,3,4)</p></blockquote>
<p>Which yields this result, with the ranking now only applying within a Product Id:</p>
<div class="wp-caption alignnone" style="width: 396px"><img title="Fig 3: Row numbering within a Subgroup" src="http://www.bimonkey.com/uploads/sql/ROW_NUMBER_3.jpg" alt="Fig 3: Row numbering within a Subgroup" width="386" height="280" /><p class="wp-caption-text">Fig 3: Row numbering within a Subgroup</p></div>
<p>Which can then be pivoted on the rank, as the key of the rank is now known:</p>
<blockquote><p><span style="color: #0000ff;">SELECT</span> ProductID<br />
,        [1] <span style="color: #0000ff;">AS</span> Bin_1<br />
,        [2] <span style="color: #0000ff;">AS</span> Bin_3<br />
,        [3] <span style="color: #0000ff;">AS</span> Bin_3</p>
<p><span style="color: #0000ff;">FROM</span></p>
<p>(</p>
<p><span style="color: #0000ff;">SELECT</span> <span style="color: #ff00ff;">ROW_NUMBER</span>()<span style="color: #0000ff;"> OVER</span> (<span style="color: #0000ff;">PARTITION BY</span> ProductId <span style="color: #0000ff;">ORDER BY</span> Quantity <span style="color: #0000ff;">DESC</span>) <span style="color: #0000ff;">AS</span> Subset_ID_Key<br />
,        [ProductID]<br />
,        [Bin]</p>
<p><span style="color: #0000ff;">FROM</span> [Production].[ProductInventory]</p>
<p><span style="color: #0000ff;">WHERE</span> [ProductID] <span style="color: #808080;">IN</span> (1,2,3,4)</p>
<p>) <span style="color: #0000ff;">AS</span> Pivot_Source</p>
<p><span style="color: #808080;">PIVOT</span></p>
<p>(<br />
<span style="color: #ff00ff;">MAX</span>(Bin)<br />
<span style="color: #0000ff;">FOR</span> Subset_ID_Key <span style="color: #808080;">IN</span> ([1],[2],[3])<br />
) <span style="color: #0000ff;">AS</span> Pivot_Output</p></blockquote>
<p>Which yields this final output:</p>
<div class="wp-caption alignnone" style="width: 248px"><img title="Fig 4: Ranked and Pivoted" src="../uploads/sql/ROW_NUMBER_4.jpg" alt="Fig 4: Ranked and Pivoted" width="238" height="129" /><p class="wp-caption-text">Fig 4: Ranked and Pivoted</p></div>
<p>All done within a single query, and not a cursor in sight. ROW_NUMBER() was a great function to discover!</p>
<p>MSDN Documentation is here for:</p>
<ul>
<li><a title="ROW_NUMBER" href="http://msdn.microsoft.com/en-us/library/ms186734.aspx">ROW_NUMBER()</a> &#8211; the key function</li>
<li><a title="OVER Clause" href="http://msdn.microsoft.com/en-us/library/ms189461.aspx">OVER</a> &#8211; ordering and subgrouping the results of ROW_NUMBER</li>
<li><a title="Using PIVOT and UNPIVOT" href="http://msdn.microsoft.com/en-us/library/ms177410.aspx">PIVOT</a> &#8211; for pivoting out the results</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/05/ranking-and-numbering-rows-and-subsets-of-rows-in-t-sql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Use the Index, Luke</title>
		<link>http://www.bimonkey.com/2011/01/use-the-index-luke/</link>
		<comments>http://www.bimonkey.com/2011/01/use-the-index-luke/#comments</comments>
		<pubDate>Sun, 30 Jan 2011 06:01:59 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=890</guid>
		<description><![CDATA[While I&#8217;m on a T-SQL bent, I stumbled upon (literally, beware of StumbleUpon as a way to waste heroic amounts of time) a free eBook called &#8220;Use The Index, Luke&#8221;. Its by a chap called Markus Winand, who I freely admit to having never heard of  (and Google isn&#8217;t particularly enlightening). However it&#8217;s a work [...]]]></description>
			<content:encoded><![CDATA[<p>While I&#8217;m on a T-SQL bent, I stumbled upon (literally, beware of StumbleUpon as a way to waste heroic amounts of time) a free eBook called <a title="Use The Index, Luke | e-Book about SQL Indexing in Oracle, SQL Server" href="http://use-the-index-luke.com/">&#8220;Use The Index, Luke&#8221;</a>. Its by a chap called Markus Winand, who I freely admit to having never heard of  (and Google isn&#8217;t particularly enlightening).</p>
<p>However it&#8217;s a work in progress explaining in relatively clear terms what database indexes are, how they work and how they can impact performance. It&#8217;s a pretty important topic for anyone who writes SQL &#8211; so I&#8217;d recommend reading what&#8217;s there and tracking it&#8217;s progress.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/01/use-the-index-luke/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Why to avoid DISTINCT and GROUP BY to get unique records</title>
		<link>http://www.bimonkey.com/2011/01/why-to-avoid-distinct-and-group-by-to-get-unique-records/</link>
		<comments>http://www.bimonkey.com/2011/01/why-to-avoid-distinct-and-group-by-to-get-unique-records/#comments</comments>
		<pubDate>Sat, 29 Jan 2011 10:45:47 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=886</guid>
		<description><![CDATA[This is a quick and dirty post on the use of DISTINCT or GROUP BY to get unique records, based on something I helped a developer with over the last couple of weeks. Their thought process was that because they were getting duplicate records, the easiest way to get rid of them was to slap [...]]]></description>
			<content:encoded><![CDATA[<p>This is a quick and dirty post on the use of DISTINCT or GROUP BY to get unique records, based on something I helped a developer with over the last couple of weeks.</p>
<p>Their thought process was that because they were getting duplicate records, the easiest way to get rid of them was to slap a DISTINCT at the start of the query to get unique results. Which, in a sense, is OK &#8211; because it worked (sort of).However there&#8217;s two very good reasons why this is not always a good approach.</p>
<h2>#1: Your query is wrong</h2>
<p>If you are getting back duplicate records, what it probably means is that you are really doing your query wrong. The below example is an admittedly imperfect example of this &#8211; as the first query returns far more than intended &#8211; but was close to what I was dealing with:</p>
<blockquote><p>USE AdventureWorks</p>
<p><span style="color: #008000;">/* Query 1: Using DISTINCT to try to eliminate duplicates */</span></p>
<p>select    DISTINCT<br />
s.Name,<br />
CASE<br />
WHEN sc.ContactTypeID = 11 THEN &#8216;Y&#8217;<br />
ELSE &#8216;N&#8217;<br />
END    AS    &#8217;OwnerContact&#8217;<br />
from    Sales.Store s<br />
left join Sales.StoreContact sc<br />
ON s.CustomerID = sc.CustomerID</p>
<p><span style="color: #008000;">/* Query 2: Using a properly formed WHERE clause */</span></p>
<p>select    s.Name,<br />
&#8216;Y&#8217; AS    &#8217;OwnerContact&#8217;<br />
from    Sales.Store s<br />
left join Sales.StoreContact sc<br />
ON s.CustomerID = sc.CustomerID<br />
WHERE sc.ContactTypeID = 11</p></blockquote>
<p>What I&#8217;m trying to illustrate with the example above is that if you consider more carefully what records you are bringing back in your joins, you are less likely to end up with duplicates. By making sure you are only joining to tables in such a way as to bring back the data you need is going to reduce the risk of other errors creeping in.</p>
<h2>#2: Performance</h2>
<p>If you are getting duplicate records, you are bringing back more data than you need. On top of this the DISTINCT or GROUP BY operations are having to go over the whole returned data set to identify unique records. This can get pretty expensive pretty quicky in terms of database operations.</p>
<p>From my perspective badly performing queries are a lesser sin than incorrect ones. I doubt many business users will be making decisions based on query length, but they will on the data you serve up to them.</p>
<h2>Summing up</h2>
<p>All I want to do in this post is make you pause and think before doing a DISTINCT or GROUP BY purely to eliminate duplicates &#8211; especially if you don&#8217;t really understand why you are getting them. A better designed and more accurate query can often get rid of the dupes and cut off the risk of bad data in the future.</p>
<p><em>Update 25 Jul 2011:</em> Mark Caldwell articulates it better than me @ SQL Team Blog: <a title="Why I Hate DISTINCT" href="http://weblogs.sqlteam.com/markc/archive/2008/11/11/60752.aspx">Why I Hate DISTINCT</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/01/why-to-avoid-distinct-and-group-by-to-get-unique-records/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An SQL alternative to the SCD</title>
		<link>http://www.bimonkey.com/2010/05/an-sql-alternative-to-the-scd/</link>
		<comments>http://www.bimonkey.com/2010/05/an-sql-alternative-to-the-scd/#comments</comments>
		<pubDate>Tue, 11 May 2010 10:47:34 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[merge]]></category>
		<category><![CDATA[Slowly Changing Dimension]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=745</guid>
		<description><![CDATA[In SQL 2008 a new T-SQL construct was added - the MERGE operation. (Ok, pedants will know this wasn&#8217;t new to Oracle,  but it was new to SQL Server). This operation allows for the merging of a dataset into a reference dataset &#8211; which can be remarkably similar to Insert / Update operations effected by the Slowly Changing Dimension [...]]]></description>
			<content:encoded><![CDATA[<p>In SQL 2008 a new T-SQL construct was added - the <a title="MERGE (Transact-SQL)" href="http://technet.microsoft.com/en-us/library/bb510625.aspx">MERGE</a> operation. (Ok, pedants will know this wasn&#8217;t new to Oracle,  but it was new to SQL Server).</p>
<p>This operation allows for the merging of a dataset into a reference dataset &#8211; which can be remarkably similar to Insert / Update operations effected by the <a title="Slowly Changing Dimension Transformation" href="http://msdn.microsoft.com/en-us/library/ms141715.aspx">Slowly Changing Dimension</a> transformation. However the way it operates is very different. Instead of the SCD&#8217;s row by row evaluation approach, the MERGE operation is a set based operation. What this means is it compares the whole of the source dataset to the reference dataset in a single pass. This has significant implications for performance &#8211; on a site where I implemented this the operation which took 1,200 seconds in the SCD cut down to 51 seconds using a Merge.</p>
<p>There are limitations and differences to be aware of:</p>
<ul>
<li>You cannot directly return row counts for Insert / Update / Ignore operations in the Merge</li>
<li>As it is a bulk operation a single row will cause failure of the whole batch</li>
<li>There&#8217;s no GUI &#8211; just hand crafted SQL</li>
<li>Less error trapping / logging options</li>
<li>More flexibility in terms of actions when matches / non matches are found</li>
</ul>
<p>The main reason why you would consider the SQL Merge &#8211; it handles Type 1, and with a little cunning, Type 2 dimensions &#8211; in a fraction of the time it takes the SCD to plod through. It&#8217;s still not as fast as a proper in memory comparison using something such as <a title="TableDifference" href="http://www.cozyroc.com/ssis/table-difference">TableDifference</a> &#8211; but it&#8217;s always good to know you have something else available in your toolkit.</p>
<p>Further information:</p>
<ul>
<li><a title="Using the SQL MERGE Statement for Slowly Changing Dimension Processing" href="http://www.kimballgroup.com/html/08dt/KU107_UsingSQL_MERGESlowlyChangingDimension.pdf">Using the SQL MERGE Statement for Slowly Changing Dimension Processing</a> &#8211; from the Kimball Group</li>
<li><a title="Alternatives to SSIS SCD Wizard Component" href="http://bennyaustin.wordpress.com/2010/05/29/alternatives-to-ssis-scd-wizard-component/">How to create type 1 &amp; 2 SCD&#8217;s using standard SSIS components (other than the SCD)</a> (at the bottom of the post) &#8211; <a title="Benny Austin" href="http://bennyaustin.wordpress.com/">Benny Austin</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/05/an-sql-alternative-to-the-scd/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Multiple LIKE clauses in a single WHERE statement</title>
		<link>http://www.bimonkey.com/2010/01/multiple-like-clauses-in-a-single-where-statement/</link>
		<comments>http://www.bimonkey.com/2010/01/multiple-like-clauses-in-a-single-where-statement/#comments</comments>
		<pubDate>Thu, 28 Jan 2010 07:20:31 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=723</guid>
		<description><![CDATA[I recently came up against a scenario where, in amongst a number of other filters, I had to deal with a couple of wildcard criteria. As there is no option to have multiple LIKE clauses in native SQL, such as WHERE Field LIKE IN (&#8216;%Option1%&#8217;,'%Option2%&#8217;), I feared I was stuck with having to duplicate my [...]]]></description>
			<content:encoded><![CDATA[<p>I recently came up against a scenario where, in amongst a number of other filters, I had to deal with a couple of wildcard criteria. As there is no option to have multiple LIKE clauses in native SQL, such as WHERE Field LIKE IN (&#8216;%Option1%&#8217;,'%Option2%&#8217;), I feared I was stuck with having to duplicate my WHERE clause in its entirety for each LIKE operation &#8211; until I found this cunning bit of code (from the thread <a title="SELECT multiple LIKE clauses and return how many columns match" href="http://www.eggheadcafe.com/software/aspnet/30423583/select-multiple-like-clau.aspx">here</a>):</p>
<blockquote><p><span style="color: #339966;">/* Apply Multiple LIKE clauses in a single WHERE clause */</span><br />
<span style="color: #0000ff;">SELECT </span>*<br />
 <br />
<span style="color: #0000ff;">FROM</span> [AdventureWorks].[Person].[Contact]<br />
 <br />
<span style="color: #0000ff;">WHERE</span> CASE<br />
  <span style="color: #0000ff;">WHEN</span> FirstName <span style="color: #808080;">LIKE<span style="color: #ff0000;"> </span></span><span style="color: #ff0000;">&#8216;Gusta%&#8217;</span> <span style="color: #0000ff;">THEN</span> 1<br />
  <span style="color: #0000ff;">WHEN</span> FirstName <span style="color: #808080;">LIKE</span><span style="color: #ff0000;"> &#8216;Cath%&#8217;</span> <span style="color: #0000ff;">THEN</span> 1<br />
  <span style="color: #0000ff;">END</span> = 1</p></blockquote>
<p>An elegant solution!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/01/multiple-like-clauses-in-a-single-where-statement/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Handling Recursive Hierarchies in SQL Server</title>
		<link>http://www.bimonkey.com/2009/09/handling-recursive-hierarchies-in-sql-server/</link>
		<comments>http://www.bimonkey.com/2009/09/handling-recursive-hierarchies-in-sql-server/#comments</comments>
		<pubDate>Fri, 18 Sep 2009 06:59:32 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Recursive Hierarchies]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=596</guid>
		<description><![CDATA[I was recently posed a question on handling recursive hierarchies which left me completely stumped, so I had to find a good solution to it. This post will cover all that, but first - &#8230;what is a Recursive Hierarchy? A recursive hierarchy is a one where children of parent members can be parents themselves, such [...]]]></description>
			<content:encoded><![CDATA[<p>I was recently posed a question on handling recursive hierarchies which left me completely stumped, so I had to find a good solution to it. This post will cover all that, but first -</p>
<h2>&#8230;what is a Recursive Hierarchy?</h2>
<p>A recursive hierarchy is a one where children of parent members can be parents themselves, such as an Organisation chart, or chart of accounts. A simple example is shown below, where the node A2 is both a <span style="text-decoration: underline;">child</span> of A1 and a <span style="text-decoration: underline;">parent</span> of A4 &amp; A5:</p>
<div class="wp-caption alignnone" style="width: 307px"><img title="A Simple Recursive Hierarchy" src="http://www.bimonkey.com/uploads/sql/recursive1.jpg" alt="b" width="297" height="120" /><p class="wp-caption-text">Fig 1: A Simple Recursive Hierarchy</p></div>
<p>Handling these within a database environment can be difficult because the number of parent / child relationships (aka &#8220;Depth&#8221;) can vary, so it is impossible to create a fixed width table to accomodate them for their future growth. Consequently, most operational systems store these in a simple two level tabel which records the parent / child relationships only, as shown below.</p>
<div class="wp-caption alignnone" style="width: 152px"><img title="The Parent Child Table" src="http://www.bimonkey.com/uploads/sql/recursive2.jpg" alt="b" width="142" height="142" /><p class="wp-caption-text">Fig 2: The Parent Child Table</p></div>
<p>From the operation systems point of view this is usually all it needs to function as they usually only need to know the relationship one step in either direction, which this table satisfies. Determining the parent of A2 or the children of A2 can be determined with a simple SELECT query.</p>
<h2>How do they cause difficulties for reporting?</h2>
<p>The difficulty faced in a SQL based reporting situation is that you cannot easily determine relationships between parents and grandchildren, great-grandchildren, etc. without nesting queries. The specific problem with this is that you cannot know from a parent exactly how many levels of children lie below it. Vice versa, you cannot know how many levels of parents a child record has. Consequently you cannot know in advance how deep to nest your queries. From a practical standpoint as well, these relationships could be hundreds or thousands deep, and writing that query can pose problems in itself, even if you know there are exactly 1,567 levels in it!</p>
<p>The problem I was posed was superficially simple &#8211; if every level of such a hierarchy can have values, such as laid out below, how do you determine the aggregate value for a given parent and all its children?</p>
<div class="wp-caption alignnone" style="width: 153px"><img title="The Values Table" src="http://www.bimonkey.com/uploads/sql/recursive3.jpg" alt="b" width="143" height="162" /><p class="wp-caption-text">Fig 3: The Values Table</p></div>
<p>For a tiny example such as this, nested queries is an option &#8211; your sub selects would only have to go two deep at most. But what if the depth changed, or ran into the tens or hundreds? Also, how do you create the generic query for any node in the tree? And for added complexity, what if there are multiple trees in the hierarchy? The problems are not trivial to resolve, but I have located two good solutions.</p>
<h2>Solution 1: The LR Method</h2>
<p>The first, and in my view most elegant, comes from Michael J. Kamfonas in his article &#8220;<a title="Recursive Hierarchies: The Relational Taboo!" href="http://kamfonas.com/id3.html">Recursive Hierarchies: The Relational Taboo!</a>&#8220;, which I strongly recommend reading to fully grasp the solution. His solution is to pre-number each node in the hierarchy with a value that on the Left forms a lower bound and on the Right an upper bound that allows you to select all nodes under it using a BETWEEN clause. A picture explains this concept clearly, with the L Value in Pink and the R Value in Green:</p>
<div class="wp-caption alignnone" style="width: 313px"><img title="The LR Method applied to a Recursive Hierachy" src="http://www.bimonkey.com/uploads/sql/recursive4.jpg" alt="b" width="303" height="122" /><p class="wp-caption-text">Fig 4: The LR Method applied to a Recursive Hierachy</p></div>
<p>So as you can see for node A2, the range between its L Value of 2 and R Value of 7 encompasses the L Values of all its children (A4 with 3, and A5 with 5). So to select all the nodes below A2 there is no need to resolve any relationships, you can simply select all children nodes based on their L values.</p>
<p>In database terms, the output looks like this:</p>
<div class="wp-caption alignnone" style="width: 232px"><img title="The L R Table" src="http://www.bimonkey.com/uploads/sql/recursive5.jpg" alt="b" width="222" height="163" /><p class="wp-caption-text">Fig 5: The L R Table</p></div>
<p>So, if I wanted to know the sum of all the values below any given node, I would use this pseudo SQL:</p>
<blockquote><p><span style="color: #0000ff;">SELECT </span>SUM(Value)<br />
<span style="color: #0000ff;">FROM </span>ValuesTable v</p>
<p><span style="color: #999999;">LEFT JOIN</span> LR_Table lr<br />
<span style="color: #0000ff;">ON </span>v.Node = lr.Node</p>
<p><span style="color: #0000ff;">WHERE </span>lr.L_Value<br />
<span style="color: #999999;">BETWEEN </span>(<span style="color: #0000ff;">SELECT </span>L_Value <span style="color: #0000ff;">FROM </span>LRTable <span style="color: #0000ff;">WHERE </span>Node = <span style="color: #ff0000;">&#8216;ParentNode&#8217;</span>)<br />
<span style="color: #999999;">AND </span>(<span style="color: #0000ff;">SELECT </span>R_Value <span style="color: #0000ff;">FROM </span>LRTable <span style="color: #0000ff;">WHERE </span>Node = <span style="color: #ff0000;">&#8216;ParentNode&#8217;</span>)</p></blockquote>
<p>To demonstrate this in practice, I have created some SQL which creates a recursive hierarchy table as in Fig 2, a values table as in Fig 3 and an LR table as in Fig 5. The LR table is then populated with L/R values by a simple cursor which logs its activity to the message window so you can see what it is doing. I don&#8217;t think it will win any prizes for efficiency, but it works! The code sample then closes out with a T-SQL sample which calculates the sum of all it and its childrens values. <a title="LR Method Sample SQL (Right Click, Save As)" href="http://www.bimonkey.com/uploads/sql/RecursiveLR.sql">Download the sample code here</a>.</p>
<h2>Solution 2: Kimball Helper Table</h2>
<p>Unsurprisingly, Ralph Kimball also has an approach. His involves the use of a &#8220;Helper Table&#8221; which he describes in his article &#8220;<a title="Helper tables handle dimensions with complex hierarchies" href="http://www.ralphkimball.com/html/articles_search/articles1998/9809d05.html">Helper tables handle dimensions with complex hierarchies</a>&#8220;. As for solution 1 I advise reading the article as I will only go into the practical implications of his approach.</p>
<p>The Kimball approach requires creating a table that stores <em>every path from each node in the tree to itself and to every node below it</em>. Looking at the example below, this means creating one row per node, plus one row for each path down the tree for each parent node to all of its children.</p>
<div class="wp-caption alignnone" style="width: 724px"><img title="Paths to capture in a Helper Table" src="http://www.bimonkey.com/uploads/sql/recursive6.jpg" alt="b" width="714" height="209" /><p class="wp-caption-text">Fig 6: Paths to capture in a Helper Table</p></div>
<p>The helper table also includes a few extra flags &#8211; the Level Depth from the top parent, and flags for the top level parent nodes (Topmost) and lowest level children nodes (Lowest). The output table ends up looking like this:</p>
<div class="wp-caption alignnone" style="width: 490px"><img title="The Kimball Helper Table" src="http://www.bimonkey.com/uploads/sql/recursive7.jpg" alt="b" width="480" height="323" /><p class="wp-caption-text">Fig 7: The Kimball Helper Table</p></div>
<p>This makes navigating relative positions in the hierarchy simpler, and allows the answering of the original question easy as all that is required is to join on the Values table to the Helper table for the given parent node, as below:</p>
<blockquote><p><span style="color: #0000ff;">SELECT </span><span style="color: #ff00ff;">SUM</span>(Value)<br />
<span style="color: #0000ff;">FROM </span>HelperTable r</p>
<p><span style="color: #808080;">LEFT JOIN</span> ValuesTable v<br />
<span style="color: #0000ff;">ON </span>r.ChildNode = v.Node</p>
<p><span style="color: #0000ff;">WHERE </span>ParentNode = &#8216;ParentNode&#8217;</p></blockquote>
<p>However &#8211; as you will see from my  <a title="Kimball Helper Table Method Sample SQL (Right Click, Save As)" href="http://www.bimonkey.com/uploads/sql/RecursiveHelper.sql">sample code</a> &#8211; populating these tables is much more complex. The sample code creates and populates the recursive hierarchy table as in Fig 2, a values table as in Fig 3 and an Helper table as in Fig 7. The process of populating the table is a mix of cursors, inserts and updates and is much trickier to get right than the LR method, because of the higher demands for information, such as Depth from Parent.</p>
<h2>Recursive Hierarchies &#8211; no problem!</h2>
<p>Above are two solid approaches to the Recursive Hierarchy problem. The code samples provided should help you get started on understanding how to handle these scenarios when trying to report against them in Database based reporting scenarios such as in SSRS. SSAS and other OLAP based reporting handles all of this in its stride however, as <a title="The Parent - Child Dimension" href="http://blogs.microsoft.co.il/blogs/barbaro/archive/2007/10/09/the-parent-child-dimension.aspx">described here, for example</a>.</p>
<p>If you find issues with my code samples or see improvements, i&#8217;d love to know about them, so please keep me posted about your experiences with them.</p>
<p>Finally, I will close out with the old joke on the subject:</p>
<blockquote><p><em>To truly understand recursion,</em></p>
<p><em>you must first understand recursion</em></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2009/09/handling-recursive-hierarchies-in-sql-server/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>10 Essential SQL Tips for Developers</title>
		<link>http://www.bimonkey.com/2009/09/10-essential-sql-tips-for-developers/</link>
		<comments>http://www.bimonkey.com/2009/09/10-essential-sql-tips-for-developers/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 22:13:07 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=599</guid>
		<description><![CDATA[10 Essential SQL Tips for Developers courtesy of Eric Shafer &#8211; covers some important basics &#8211; if you scan through that list and don&#8217;t understand all 10 points, make an effort to ensure you do &#8211; they aren&#8217;t mind boggling obscurities but important basics. ti7mfkj6sr]]></description>
			<content:encoded><![CDATA[<p><a title="10 Essential SQL Tips for Developers" href="http://net.tutsplus.com/tutorials/other/10-essential-sql-tips-for-developers/">10 Essential SQL Tips for Developers</a> courtesy of Eric Shafer &#8211; covers some important basics &#8211; if you scan through that list and don&#8217;t understand all 10 points, make an effort to ensure you do &#8211; they aren&#8217;t mind boggling obscurities but important basics.</p>
<p>ti7mfkj6sr</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2009/09/10-essential-sql-tips-for-developers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Count the number of rows in every Table in a Database in no time!</title>
		<link>http://www.bimonkey.com/2009/06/count-the-number-of-rows-in-every-table-in-a-database-in-no-time/</link>
		<comments>http://www.bimonkey.com/2009/06/count-the-number-of-rows-in-every-table-in-a-database-in-no-time/#comments</comments>
		<pubDate>Wed, 03 Jun 2009 04:47:43 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[DMV]]></category>
		<category><![CDATA[Row Count]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=307</guid>
		<description><![CDATA[Here is a piece of T-SQL code that uses DMV&#8217;s (Dynamic Management Views) to give an approximate row count of every table in your database in virtually no time at all. Bear in mind it is running off collected statistics so won&#8217;t always be 100% accurate &#8211; but it&#8217;s far quicker than doing a Count(*) [...]]]></description>
			<content:encoded><![CDATA[<p>Here is a piece of T-SQL code that uses DMV&#8217;s (Dynamic Management Views) to give an <em>approximate </em>row count of every table in your database in virtually no time at all. Bear in mind it is running off collected statistics so won&#8217;t always be 100% accurate &#8211; but it&#8217;s far quicker than doing a Count(*) on a table by table basis when you just need a rough idea when doing sizing. In case you don&#8217;t know what DMV&#8217;s there are, go poke around in SSMS &#8211; [Database] &gt; Views &gt; System Views and see what&#8217;s there. Anyway, the code:</p>
<pre><span style="color: #0000ff;">SELECT</span>    s.[Name] as [Schema]
,        t.[name] as [Table]
,        <span style="color: #ff00ff;">SUM</span>(p.rows) as [RowCount]

<span style="color: #0000ff;">FROM</span> <span style="color: #339966;">sys.schemas</span> s
<span style="color: #808080;">LEFT JOIN</span> <span style="color: #339966;">sys.tables</span> t
<span style="color: #0000ff;">ON</span> s.<span style="color: #ff00ff;">schema_id</span> = t.<span style="color: #ff00ff;">schema_id</span>

<span style="color: #808080;">LEFT JOIN</span> <span style="color: #339966;">sys.partitions</span> p
<span style="color: #0000ff;">ON</span> t.<span style="color: #ff00ff;">object_id</span> = p.<span style="color: #ff00ff;">object_id</span>

<span style="color: #808080;">LEFT JOIN</span>  <span style="color: #339966;">sys.allocation_units</span> a
<span style="color: #0000ff;">ON</span>  p.partition_id = a.container_id

<span style="color: #0000ff;">WHERE</span>    p.index_id  in(0,1) <span style="color: #339966;">-- 0 heap table , 1 table with clustered index</span>
<span style="color: #808080;">AND</span>        p.rows is not null
<span style="color: #808080;">AND</span>        a.type = 1  <span style="color: #339966;">-- row-data only , not LOB</span>

<span style="color: #0000ff;">GROUP BY</span> s.[Name], t.[name]
<span style="color: #0000ff;">ORDER BY</span> 1,2</pre>
<p>For a very swift explanation &#8211; <span style="color: #339966;">sys.schemas</span> and <span style="color: #339966;">sys.tables</span> list the schemas and tables in the database, so joining these together on <span style="color: #ff00ff;">schema_id</span> gives a list of all tables by schema in the database. Adding on <span style="color: #339966;">sys.partitions</span> then pulls in the partitions associated with each table, and finally <span style="color: #339966;">sys.allocation_units</span> pulls in the allocation units, which i&#8217;m not quite sure what they are &#8211; the guts of this query were pulled from another blog which I embarrasingly can&#8217;t trace back to now.</p>
<p>I&#8217;m no expert on DMV&#8217;s so if you have any views on the quality of this query &#8211; please leave a comment with your thoughts.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2009/06/count-the-number-of-rows-in-every-table-in-a-database-in-no-time/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

