<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BI Monkey &#187; SQL Server</title>
	<atom:link href="http://www.bimonkey.com/category/sqlserver/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.bimonkey.com</link>
	<description>James Beresford on Microsoft BI and Consulting in Sydney, Australia</description>
	<lastBuildDate>Mon, 23 Jan 2012 22:01:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>SQL Pass Day #3</title>
		<link>http://www.bimonkey.com/2011/10/sql-pass-day-3/</link>
		<comments>http://www.bimonkey.com/2011/10/sql-pass-day-3/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 23:46:55 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Hadoop]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1108</guid>
		<description><![CDATA[The third and final day at SQL Pass was presaged by me at the bloggers table (though only able to manically tweet) watching Dr DeWitt&#8217;s keynote &#8211; and I can see why his keynotes are so highly regarded. His subject was Big Data &#8211; and given the potential for this to be a dull and [...]]]></description>
			<content:encoded><![CDATA[<p>The third and final day at SQL Pass was presaged by me at the bloggers table (though only able to manically tweet) watching Dr DeWitt&#8217;s keynote &#8211; and I can see why his keynotes are so highly regarded. His subject was Big Data &#8211; and given the potential for this to be a dull and impenetrable subject area &#8211; he gave a great and illuminating talk on the topic.</p>
<p>Topics that he covered included:</p>
<ul>
<li>ACID vs BASE (i.e the battle between consistency of data vs. availability of data)</li>
<li>NoSQL is a means of querying raw data with no cleansing / structure / ETL</li>
<li>His expectation is that Structured (SQL) and Unstructured (Hadoop) data will coexist in organisations</li>
<li>Hadoop consists of Storage (HDFS) and Process (MapReduce)</li>
<li>MapReduce is too complex to work with so languages such as Hive and Pig sit on top of it</li>
<li>Sqoop is the tool to make Unstructured and Structured data talk &#8211; but performance is not good</li>
</ul>
<p>I can&#8217;t really do his talk justice but now I understand Hadoop a whole lot better &#8211; essentially it&#8217;s just a read only store of unstructured data, a very different beast to a relational database and addressing totally different needs.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/10/sql-pass-day-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Pass Day #2</title>
		<link>http://www.bimonkey.com/2011/10/sql-pass-day-2/</link>
		<comments>http://www.bimonkey.com/2011/10/sql-pass-day-2/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 00:54:17 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Denali]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[SQL Pass 2011]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1104</guid>
		<description><![CDATA[So, on to day 2 of SQL Pass, and a very SSIS focused one &#8211; mainly because I attended Matt Masson&#8217;s SSIS session and learned about a whole new bunch of nice features that have made it in to the next version. I took away the following interesting points from that session: CDC (Change Data [...]]]></description>
			<content:encoded><![CDATA[<p>So, on to day 2 of SQL Pass, and a very SSIS focused one &#8211; mainly because I attended Matt Masson&#8217;s SSIS session and learned about a whole new bunch of nice features that have made it in to the next version.</p>
<p>I took away the following interesting points from that session:</p>
<ol>
<li><strong>CDC (Change Data Capture)</strong> is supported more effectively through some new components &#8211; a CDC Control, CDC Source and CDC Splitter</li>
<li><strong>ODBC </strong>improvements mean improved performance for non SQL Server databases</li>
<li><strong>Connection Managers </strong>get a few new features &#8211; Offline, Expression and Project indicators. Also Offline Connection Managers are picked out through a timeout and importantly now <strong>halt validation</strong> of any related components (so you no longer get those drags as SSIS tries to validate components and flows hooked to dead connections)</li>
<li><strong>File Connection Manager</strong> can now handle variable numbers of columns (i.e. it won&#8217;t crash)</li>
<li><strong>Pivot </strong>gets a UI &#8211; hurrah! (Note: SCD still sucks)</li>
<li>Project <strong>Parameters </strong>can be configured at design time though Visual Studio Configurations</li>
<li><strong>Breakpoints </strong>are now in the Script Component so we can see what data is causing or components to blow up</li>
<li><strong>Data Taps</strong> &#8211; data viewers for live execution that dump out to .csv files</li>
<li>Package <strong>Execution </strong>via PowerShell or even T-SQL!</li>
</ol>
<p>Matt also gave a preview of Barcelona in action, and it looks pretty neat.</p>
<p>I also attended a DQS session that showed a few new features on the UI. Elad Ziklik highlighted that the CTP3 release should be viewed more as a capability preview rather than a test drive of a functional product &#8211; so looking forward to a new CTP.</p>
<p>One more day down &#8211; one to go&#8230;!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/10/sql-pass-day-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Pass Day #1</title>
		<link>http://www.bimonkey.com/2011/10/sql-pass-day-1/</link>
		<comments>http://www.bimonkey.com/2011/10/sql-pass-day-1/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 00:28:36 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Denali]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[SQL Pass 2011]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1091</guid>
		<description><![CDATA[So the BI Monkey was at PASS and intended blogging on his phone or at one of the many internet pods provided, but sadly WordPress and IE / Android aren&#8217;t friends so this is going to be a retrospective. So, here are my takeaways from day 1: From the keynote: Denali becomes SQL2012 and is [...]]]></description>
			<content:encoded><![CDATA[<p>So the BI Monkey was at PASS and intended blogging on his phone or at one of the many internet pods provided, but sadly WordPress and IE / Android aren&#8217;t friends so this is going to be a retrospective.</p>
<p>So, here are my takeaways from day 1:</p>
<p><em>From the keynote:</em></p>
<p>Denali becomes SQL2012 and is slated for release in the first half of next year (way to allow yourself some leeway on the final release date!)</p>
<p>Crescent becomes Power View &#8211; and will work on multiple mobile platforms &#8211; Apple &amp; Android via Browser, WP7 via a richer app</p>
<p>HADOOP is going to be supported in Windows server &#8211; the first CTP is due next year. If you have access to the PASS DVD&#8217;s / Sessions &#8211; see Dr Dewitt&#8217;s presentation on HADOOP &#8211; very enlightening</p>
<p><em>From wandering around the summit:</em></p>
<p>As per Elad Ziklik of the DQS team, performance is one of their focuses for the next release</p>
<p>I got to finally meet Ivan Peev of <a title="CozyRoc" href="http://www.cozyroc.com/">CozyRoc</a> after many years of email and phone conversations, and he told me about their new SAS Connector &#8211; which allows reading and writing to SAS datasets without an actual SAS install.</p>
<p>I also got to meet Matt Masson, guru of the SSIS team who told me about <a title="Project Barcelona Team Blog" href="http://blogs.msdn.com/b/project_barcelona_team_blog/">Project Barcelona</a> &#8211; a tool that will do data lineage, metadata management and impact analysis via a crawler as opposed to a manually maintained set.</p>
<p>I also got to sit in on a customer feedback session about BI in the Cloud &#8211; unfortunately all under NDA &#8211; but it was a great forum to discuss and help direct Microsoft&#8217;s Cloud BI ambitions.</p>
<p>I also had a chat with fellow Aussie BI guy <a title="Roger Noble" href="http://www.rogernoble.com/">Roger Noble</a> who told me about a use for the Term Extraction transformation in SSIS &#8211; using it to scan through documents and auto-tag them as the were uploaded to SharePoint &#8211; which is pretty cool.</p>
<p>So, that was Day 1&#8230; Day 2 to follow!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/10/sql-pass-day-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Columnstore Indexes revisited</title>
		<link>http://www.bimonkey.com/2011/09/columnstore-indexes-revisited/</link>
		<comments>http://www.bimonkey.com/2011/09/columnstore-indexes-revisited/#comments</comments>
		<pubDate>Mon, 19 Sep 2011 23:27:22 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Columnstore]]></category>
		<category><![CDATA[Denali]]></category>
		<category><![CDATA[Indexes]]></category>
		<category><![CDATA[Vertipaq]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1058</guid>
		<description><![CDATA[Having now researched Columnstore Indexes further, I thought I&#8217;d share the key learning I&#8217;ve picked up on this feature &#8211; which now sounds even more powerful than I&#8217;d originally thought. The most important thing to take away is that a Columnstore Index should actually cover the entire table. Its name is a little misleading &#8211; [...]]]></description>
			<content:encoded><![CDATA[<p>Having now researched Columnstore Indexes further, I thought I&#8217;d share the key learning I&#8217;ve picked up on this feature &#8211; which now sounds even more powerful than I&#8217;d originally thought.</p>
<p>The most important thing to take away is that a Columnstore Index should actually cover the <strong>entire table</strong>. Its name is a little misleading &#8211; the feature is less of an index, and more of a shadow copy of the table&#8217;s data, compressed with the Vertipaq voodoo. I suspect they have used the term index because the Columnstore doesn&#8217;t cover all data types &#8211; the important ones are there, but some extreme decimals and blobs are excluded &#8211; for a full list see the <a title="Columnstore Indexes MSDN Documentation" href="http://msdn.microsoft.com/en-us/library/gg492088%28v=sql.110%29.aspx">MSDN documentation</a>. So for any big table, whack a Columnstore index across the entire table.</p>
<p>Next up is to understand how to use them and how to detect when they are or are not being used. The key thing is to only use them in isolation (e.g. summary queries) or for <strong>Inner Joins</strong>. Outer Joins don&#8217;t work right now, though there are cunning workarounds that apply if you are Outer Joining to summary data &#8211; see Eric Hanson&#8217;s video referenced below somewhere around the 50 minute mark.</p>
<p>You can detect when they are being used by the Execution Mode described in the Query Plan. This is new in Denali and is either Row or Batch. Row means traditional SQL Server execution and Batch means the Columnstore is being used.</p>
<p>So, the key takeaways:</p>
<ul>
<li>For any large table put a Columnstore index across the entire table</li>
<li>Only join using Inner Joins</li>
<li>Spot the use of the Columnstore in Query plans via the Execution Mode of Batch</li>
</ul>
<p>Useful reference material:</p>
<ul>
<li><a title="Columnstore Indexes Unveiled" href="http://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI312">Eric Hanson&#8217;s Tech-Ed video</a> (a bit dull, but informative &#8211; if you know your Data Warehousing theory, skip the first 15 minutes)</li>
<li>The TechNet <a title="    Article     History  SQL Server Columnstore Index FAQ" href="http://social.technet.microsoft.com/wiki/contents/articles/sql-server-columnstore-index-faq.aspx">Columnstore Index FAQ</a></li>
<li><a title="Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0" href="http://download.microsoft.com/download/8/C/1/8C1CE06B-DE2F-40D1-9C5C-3EE521C25CE9/Columnstore%20Indexes%20for%20Fast%20DW%20QP%20SQL%20Server%2011.pdf">Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0</a> &#8211; Whitepaper by Eric Hanson</li>
<li><a title="Columnstore Indexes MSDN Documentation" href="http://msdn.microsoft.com/en-us/library/gg492088%28v=sql.110%29.aspx">MSDN documentation</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/09/columnstore-indexes-revisited/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OLE DB is dead &#8211; Long live ODBC!</title>
		<link>http://www.bimonkey.com/2011/09/ole-db-is-dead-long-live-odbc/</link>
		<comments>http://www.bimonkey.com/2011/09/ole-db-is-dead-long-live-odbc/#comments</comments>
		<pubDate>Mon, 19 Sep 2011 04:21:53 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[ODBC]]></category>
		<category><![CDATA[OLE DB]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1055</guid>
		<description><![CDATA[Something I picked up in passing at the MSDN SSIS forum &#8211; SQL Native Client OLE DB is being deprecated post Denali. See this post on the SQL Native Client Team Blog. What does this mean? First up this only applies to SQL Server connections &#8211; other OLE DB connectivity such as Teradata and Oracle [...]]]></description>
			<content:encoded><![CDATA[<p>Something I picked up in passing at the <a title="SQL Server Integration Services" href="http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/threads">MSDN SSIS forum</a> &#8211; SQL Native Client OLE DB is being deprecated post Denali. See <a title="Microsoft is Aligning with ODBC for Native Relational Data Access" href="http://blogs.msdn.com/b/sqlnativeclient/archive/2011/08/29/microsoft-is-aligning-with-odbc-for-native-relational-data-access.aspx">this post on the SQL Native Client Team Blog</a>.</p>
<p>What does this mean?</p>
<ol>
<li>First up this only applies to SQL Server connections &#8211; other OLE DB connectivity such as Teradata and Oracle will be unaffected.</li>
<li>In the short term, nothing &#8211; if you are using OLE DB it will carry on working for a fair few years yet &#8211; but if you are planning for a systems longevity &#8211; look to using ODBC for all your SQL Server connectivity needs</li>
</ol>
<p>It&#8217;s a slightly confusing move given that ODBC underperforms OLE DB &#8211; so it&#8217;s bad news for us SSIS people, unless part of the change also includes some significant performance and capability improvements.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/09/ole-db-is-dead-long-live-odbc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Columnstore indexes in Denali (aka: &#8220;Apollo&#8221;)</title>
		<link>http://www.bimonkey.com/2011/08/columnstore-indexes-in-denali-aka-apollo/</link>
		<comments>http://www.bimonkey.com/2011/08/columnstore-indexes-in-denali-aka-apollo/#comments</comments>
		<pubDate>Tue, 02 Aug 2011 04:52:39 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Columnstore]]></category>
		<category><![CDATA[Denali]]></category>
		<category><![CDATA[Indexes]]></category>
		<category><![CDATA[Vertipaq]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1010</guid>
		<description><![CDATA[James Serra has a great post on a new feature in SQL Server Denali &#8211; Columnstore indexes. The tl/dr version is this: Columnstore indexes use the Vertipaq compression engine (that&#8217;s the shiny compression engine in PowerPivot) to further compact indexes to make querying them between 10-100 times faster. The most significant limitation is that tables [...]]]></description>
			<content:encoded><![CDATA[<p><a title="James Serra" href="http://www.jamesserra.com/">James Serra</a> has a <a title="SQL Server “Denali”: Project Apollo" href="http://www.jamesserra.com/archive/2011/08/sql-server-%e2%80%9cdenali%e2%80%9d-project-apollo/">great post on a new feature in SQL Server Denali &#8211; Columnstore indexes</a>.</p>
<p>The tl/dr version is this:</p>
<p>Columnstore indexes use the Vertipaq compression engine (that&#8217;s the shiny compression engine in PowerPivot) to further compact indexes to make querying them between 10-100 times faster.</p>
<p>The most significant limitation is that tables become read-only when they have a columnstore index (no Inserts / Updates / Deletes etc) &#8211; though James notes you can work around this by using Partitions if you are dealing with tables that are just additive. Otherwise indexes will need to be dropped and recreated as data changes.</p>
<p>So &#8211; a powerful new indexing feature which, with careful management &#8211; can have a serious positive impact on the performance of your Data Warehouse.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/08/columnstore-indexes-in-denali-aka-apollo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ranking and Numbering rows &#8211; and subsets of rows &#8211; in T-SQL</title>
		<link>http://www.bimonkey.com/2011/05/ranking-and-numbering-rows-and-subsets-of-rows-in-t-sql/</link>
		<comments>http://www.bimonkey.com/2011/05/ranking-and-numbering-rows-and-subsets-of-rows-in-t-sql/#comments</comments>
		<pubDate>Thu, 19 May 2011 11:25:48 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Pivot]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=901</guid>
		<description><![CDATA[I recently had to deal with a scenario where I needed to pivot out some rows after ordering (ranking) them according to specific rules so I could present some rows of data as columns, but in a specific order (don&#8217;t ask why, it&#8217;ll make me grind my teeth about data analysts that don&#8217;t understand how [...]]]></description>
			<content:encoded><![CDATA[<p>I recently had to deal with a scenario where I needed to pivot out some rows after ordering (ranking) them according to specific rules so I could present some rows of data as columns, but in a specific order (don&#8217;t ask why, it&#8217;ll make me grind my teeth about data analysts that don&#8217;t understand how to analyse data&#8230;). The ordering in itself was only part of the solution, as to Pivot the data, the keys need to be specified in the query, so the natural keys can&#8217;t be used. The scenario is set out below:</p>
<div class="wp-caption alignnone" style="width: 639px"><img class=" " title="Fig 1: Rank and Pivot. The Rank column needed to be added" src="http://www.bimonkey.com/uploads/sql/ROW_NUMBER_1.jpg" alt="Fig 1: Rank and Pivot. The Rank column needed to be added" width="629" height="146" /><p class="wp-caption-text">Fig 1: Rank and Pivot. The Rank column needed to be added</p></div>
<p>My first thought was that I&#8217;d have to solve this with a cursor, which wasn&#8217;t a practical option as there were 1.5m rows of data to process, and if my solution involves a cursor I instantly think it&#8217;s a lousy solution. However I was pleased to discover the T-SQL function <strong>ROW_NUMBER() </strong>which allows you to add row numbering to ordered data and even subgroups of that data. (The below samples use the AdventureWorks2008 database.)</p>
<p>First up, basic row numbering:</p>
<blockquote><p><span style="color: #0000ff;">SELECT</span> <span style="color: #ff00ff;"> ROW_NUMBER</span>() <span style="color: #0000ff;">OVER</span> (<span style="color: #0000ff;">ORDER BY</span> ProductId) <span style="color: #0000ff;">AS</span> ID_Key<br />
,        [ProductID]<br />
,        [LocationID]<br />
,        [Shelf]<br />
,        [Bin]<br />
,        [Quantity]</p>
<p><span style="color: #0000ff;">FROM</span> [Production].[ProductInventory]</p>
<p><span style="color: #0000ff;">WHERE</span> [ProductID] <span style="color: #808080;">IN</span> (1,2,3,4)</p></blockquote>
<p>The above query adds an ID key to the data based on ordering by the ProductID field. The <strong>ROW_NUMBER()</strong> function requires an <strong>OVER</strong> clause to know on what basis it should assign the key, and this has to be an ORDER BY statement. The end result looks like this:</p>
<div class="wp-caption alignnone" style="width: 356px"><img title="Fig 2: Simple row numbering" src="http://www.bimonkey.com/uploads/sql/ROW_NUMBER_2.jpg" alt="Fig 2: Simple row numbering" width="346" height="285" /><p class="wp-caption-text">Fig 2: Simple row numbering</p></div>
<p>You can extend this to order within a subgroup, by specifying a <strong>PARTITION BY</strong> clause so ROW_NUMBER() operates with that subgroup. In the example below I partition by ProductId:</p>
<blockquote><p><span style="color: #0000ff;"> SELECT</span> <span style="color: #ff00ff;">ROW_NUMBER</span>() <span style="color: #0000ff;">OVER</span> (<span style="color: #0000ff;">PARTITION BY</span> ProductId <span style="color: #0000ff;">ORDER BY</span> Quantity<span style="color: #0000ff;"> DESC</span>) <span style="color: #0000ff;">AS</span> Subset_ID_Key<br />
,        [ProductID]<br />
,        [LocationID]<br />
,        [Shelf]<br />
,        [Bin]<br />
,        [Quantity]</p>
<p><span style="color: #0000ff;">FROM</span> [Production].[ProductInventory]</p>
<p><span style="color: #0000ff;">WHERE</span> [ProductID] <span style="color: #808080;">IN</span> (1,2,3,4)</p></blockquote>
<p>Which yields this result, with the ranking now only applying within a Product Id:</p>
<div class="wp-caption alignnone" style="width: 396px"><img title="Fig 3: Row numbering within a Subgroup" src="http://www.bimonkey.com/uploads/sql/ROW_NUMBER_3.jpg" alt="Fig 3: Row numbering within a Subgroup" width="386" height="280" /><p class="wp-caption-text">Fig 3: Row numbering within a Subgroup</p></div>
<p>Which can then be pivoted on the rank, as the key of the rank is now known:</p>
<blockquote><p><span style="color: #0000ff;">SELECT</span> ProductID<br />
,        [1] <span style="color: #0000ff;">AS</span> Bin_1<br />
,        [2] <span style="color: #0000ff;">AS</span> Bin_3<br />
,        [3] <span style="color: #0000ff;">AS</span> Bin_3</p>
<p><span style="color: #0000ff;">FROM</span></p>
<p>(</p>
<p><span style="color: #0000ff;">SELECT</span> <span style="color: #ff00ff;">ROW_NUMBER</span>()<span style="color: #0000ff;"> OVER</span> (<span style="color: #0000ff;">PARTITION BY</span> ProductId <span style="color: #0000ff;">ORDER BY</span> Quantity <span style="color: #0000ff;">DESC</span>) <span style="color: #0000ff;">AS</span> Subset_ID_Key<br />
,        [ProductID]<br />
,        [Bin]</p>
<p><span style="color: #0000ff;">FROM</span> [Production].[ProductInventory]</p>
<p><span style="color: #0000ff;">WHERE</span> [ProductID] <span style="color: #808080;">IN</span> (1,2,3,4)</p>
<p>) <span style="color: #0000ff;">AS</span> Pivot_Source</p>
<p><span style="color: #808080;">PIVOT</span></p>
<p>(<br />
<span style="color: #ff00ff;">MAX</span>(Bin)<br />
<span style="color: #0000ff;">FOR</span> Subset_ID_Key <span style="color: #808080;">IN</span> ([1],[2],[3])<br />
) <span style="color: #0000ff;">AS</span> Pivot_Output</p></blockquote>
<p>Which yields this final output:</p>
<div class="wp-caption alignnone" style="width: 248px"><img title="Fig 4: Ranked and Pivoted" src="../uploads/sql/ROW_NUMBER_4.jpg" alt="Fig 4: Ranked and Pivoted" width="238" height="129" /><p class="wp-caption-text">Fig 4: Ranked and Pivoted</p></div>
<p>All done within a single query, and not a cursor in sight. ROW_NUMBER() was a great function to discover!</p>
<p>MSDN Documentation is here for:</p>
<ul>
<li><a title="ROW_NUMBER" href="http://msdn.microsoft.com/en-us/library/ms186734.aspx">ROW_NUMBER()</a> &#8211; the key function</li>
<li><a title="OVER Clause" href="http://msdn.microsoft.com/en-us/library/ms189461.aspx">OVER</a> &#8211; ordering and subgrouping the results of ROW_NUMBER</li>
<li><a title="Using PIVOT and UNPIVOT" href="http://msdn.microsoft.com/en-us/library/ms177410.aspx">PIVOT</a> &#8211; for pivoting out the results</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/05/ranking-and-numbering-rows-and-subsets-of-rows-in-t-sql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Use the Index, Luke</title>
		<link>http://www.bimonkey.com/2011/01/use-the-index-luke/</link>
		<comments>http://www.bimonkey.com/2011/01/use-the-index-luke/#comments</comments>
		<pubDate>Sun, 30 Jan 2011 06:01:59 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=890</guid>
		<description><![CDATA[While I&#8217;m on a T-SQL bent, I stumbled upon (literally, beware of StumbleUpon as a way to waste heroic amounts of time) a free eBook called &#8220;Use The Index, Luke&#8221;. Its by a chap called Markus Winand, who I freely admit to having never heard of  (and Google isn&#8217;t particularly enlightening). However it&#8217;s a work [...]]]></description>
			<content:encoded><![CDATA[<p>While I&#8217;m on a T-SQL bent, I stumbled upon (literally, beware of StumbleUpon as a way to waste heroic amounts of time) a free eBook called <a title="Use The Index, Luke | e-Book about SQL Indexing in Oracle, SQL Server" href="http://use-the-index-luke.com/">&#8220;Use The Index, Luke&#8221;</a>. Its by a chap called Markus Winand, who I freely admit to having never heard of  (and Google isn&#8217;t particularly enlightening).</p>
<p>However it&#8217;s a work in progress explaining in relatively clear terms what database indexes are, how they work and how they can impact performance. It&#8217;s a pretty important topic for anyone who writes SQL &#8211; so I&#8217;d recommend reading what&#8217;s there and tracking it&#8217;s progress.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/01/use-the-index-luke/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Why to avoid DISTINCT and GROUP BY to get unique records</title>
		<link>http://www.bimonkey.com/2011/01/why-to-avoid-distinct-and-group-by-to-get-unique-records/</link>
		<comments>http://www.bimonkey.com/2011/01/why-to-avoid-distinct-and-group-by-to-get-unique-records/#comments</comments>
		<pubDate>Sat, 29 Jan 2011 10:45:47 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=886</guid>
		<description><![CDATA[This is a quick and dirty post on the use of DISTINCT or GROUP BY to get unique records, based on something I helped a developer with over the last couple of weeks. Their thought process was that because they were getting duplicate records, the easiest way to get rid of them was to slap [...]]]></description>
			<content:encoded><![CDATA[<p>This is a quick and dirty post on the use of DISTINCT or GROUP BY to get unique records, based on something I helped a developer with over the last couple of weeks.</p>
<p>Their thought process was that because they were getting duplicate records, the easiest way to get rid of them was to slap a DISTINCT at the start of the query to get unique results. Which, in a sense, is OK &#8211; because it worked (sort of).However there&#8217;s two very good reasons why this is not always a good approach.</p>
<h2>#1: Your query is wrong</h2>
<p>If you are getting back duplicate records, what it probably means is that you are really doing your query wrong. The below example is an admittedly imperfect example of this &#8211; as the first query returns far more than intended &#8211; but was close to what I was dealing with:</p>
<blockquote><p>USE AdventureWorks</p>
<p><span style="color: #008000;">/* Query 1: Using DISTINCT to try to eliminate duplicates */</span></p>
<p>select    DISTINCT<br />
s.Name,<br />
CASE<br />
WHEN sc.ContactTypeID = 11 THEN &#8216;Y&#8217;<br />
ELSE &#8216;N&#8217;<br />
END    AS    &#8217;OwnerContact&#8217;<br />
from    Sales.Store s<br />
left join Sales.StoreContact sc<br />
ON s.CustomerID = sc.CustomerID</p>
<p><span style="color: #008000;">/* Query 2: Using a properly formed WHERE clause */</span></p>
<p>select    s.Name,<br />
&#8216;Y&#8217; AS    &#8217;OwnerContact&#8217;<br />
from    Sales.Store s<br />
left join Sales.StoreContact sc<br />
ON s.CustomerID = sc.CustomerID<br />
WHERE sc.ContactTypeID = 11</p></blockquote>
<p>What I&#8217;m trying to illustrate with the example above is that if you consider more carefully what records you are bringing back in your joins, you are less likely to end up with duplicates. By making sure you are only joining to tables in such a way as to bring back the data you need is going to reduce the risk of other errors creeping in.</p>
<h2>#2: Performance</h2>
<p>If you are getting duplicate records, you are bringing back more data than you need. On top of this the DISTINCT or GROUP BY operations are having to go over the whole returned data set to identify unique records. This can get pretty expensive pretty quicky in terms of database operations.</p>
<p>From my perspective badly performing queries are a lesser sin than incorrect ones. I doubt many business users will be making decisions based on query length, but they will on the data you serve up to them.</p>
<h2>Summing up</h2>
<p>All I want to do in this post is make you pause and think before doing a DISTINCT or GROUP BY purely to eliminate duplicates &#8211; especially if you don&#8217;t really understand why you are getting them. A better designed and more accurate query can often get rid of the dupes and cut off the risk of bad data in the future.</p>
<p><em>Update 25 Jul 2011:</em> Mark Caldwell articulates it better than me @ SQL Team Blog: <a title="Why I Hate DISTINCT" href="http://weblogs.sqlteam.com/markc/archive/2008/11/11/60752.aspx">Why I Hate DISTINCT</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/01/why-to-avoid-distinct-and-group-by-to-get-unique-records/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Server BI / DW Scalability</title>
		<link>http://www.bimonkey.com/2010/12/sql-server-bi-dw-scalability/</link>
		<comments>http://www.bimonkey.com/2010/12/sql-server-bi-dw-scalability/#comments</comments>
		<pubDate>Mon, 13 Dec 2010 22:02:40 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Analysis Services]]></category>
		<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=868</guid>
		<description><![CDATA[One of the common FUD* techniques deployed against SQL Server is to raise questions about its ability to scale. Rather than blather on about technical reasons why this is bunkum, i&#8217;ll just hit you up with some numbers from this springs SQL PASS: Category Metric Largest single database (DW) 80 TB Largest table 20 TB [...]]]></description>
			<content:encoded><![CDATA[<p>One of the common <a title="Fear, uncertainty and doubt" href="http://en.wikipedia.org/wiki/Fear,_uncertainty_and_doubt">FUD</a>* techniques deployed against SQL Server is to raise questions about its ability to scale. Rather than blather on about technical reasons why this is bunkum, i&#8217;ll just hit you up with <a title="See the Largest Mission Critical Deployment of Microsoft SQL Server around the World" href="http://springssql.sqlpass.org/LinkClick.aspx?fileticket=GnIyYpOI6Ts%3D&amp;tabid=1223">some numbers from this springs SQL PASS</a>:</p>
<table border="1" cellspacing="0" cellpadding="4" width="463">
<colgroup span="1">
<col span="1" width="381"></col>
<col span="1" width="82"></col>
</colgroup>
<tbody>
<tr height="26">
<td dir="ltr" width="381" height="26"><b>Category</b></td>
<td dir="ltr" width="82"><b>Metric</b></td>
</tr>
<tr height="27">
<td dir="ltr" width="381" height="27">Largest single database (DW)</td>
<td dir="ltr" width="82">80 TB</td>
</tr>
<tr height="26">
<td dir="ltr" width="381" height="26">Largest table</td>
<td dir="ltr" width="82">20 TB</td>
</tr>
<tr height="26">
<td dir="ltr" width="381" height="26">Biggest total data 1 customer</td>
<td dir="ltr" width="82">2.5 PB</td>
</tr>
<tr height="26">
<td dir="ltr" width="381" height="26">Highest transactions per second 1 db &#8211; OLTP</td>
<td dir="ltr" width="82">36,000</td>
</tr>
<tr height="26">
<td dir="ltr" width="381" height="26">Fastest I/O subsystem in production</td>
<td dir="ltr" width="82">18 GB/sec</td>
</tr>
<tr height="26">
<td dir="ltr" width="381" height="26">Data load for 1TB</td>
<td dir="ltr" width="82">20 minutes</td>
</tr>
<tr height="26">
<td dir="ltr" width="381" height="26">Largest cube</td>
<td dir="ltr" width="82">4.2 TB</td>
</tr>
</tbody>
</table>
<p><br/><br />
So, seriously, unless you have volumes of data on a par with MySpace&#8230;   Scalability is not an issue.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/12/sql-server-bi-dw-scalability/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

