<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BI Monkey</title>
	<atom:link href="http://www.bimonkey.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.bimonkey.com</link>
	<description>James Beresford on Microsoft BI and Consulting in Sydney, Australia</description>
	<lastBuildDate>Mon, 23 Jan 2012 22:01:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>I&#8217;m presenting at SQL Server User Group on Feb 14th</title>
		<link>http://www.bimonkey.com/2012/01/im-presenting-at-sql-server-user-group-on-feb-14th/</link>
		<comments>http://www.bimonkey.com/2012/01/im-presenting-at-sql-server-user-group-on-feb-14th/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 22:01:56 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Presentations]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1162</guid>
		<description><![CDATA[If you’re in Sydney on Valentine’s day and don’t like your wife / partner all that much, you can come and spend time with something you really love – SQL Server! I’m presenting at the SQL Server User Group on Feb 14th – the session title is “Introducing SQL Server Data Quality Services, plus what’s [...]]]></description>
			<content:encoded><![CDATA[<p>If you’re in Sydney on Valentine’s day and don’t like your wife / partner all that much, you can come and spend time with something you <em>really</em> love – SQL Server! I’m presenting at the SQL Server User Group on Feb 14<sup>th</sup> – the session title is “<strong>Introducing SQL Server Data Quality Services, plus what’s new in SQL2012 SSIS</strong>”. Please register at: <a href="http://www.sqlserver.org.au/Events/ViewEvent.aspx?EventId=577">http://www.sqlserver.org.au/Events/ViewEvent.aspx?EventId=577</a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2012/01/im-presenting-at-sql-server-user-group-on-feb-14th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Server Data Quality Services in SQL2012 RC0 &#8211; Part 2</title>
		<link>http://www.bimonkey.com/2012/01/sql-server-data-quality-services-in-sql2012-rc0-part-2/</link>
		<comments>http://www.bimonkey.com/2012/01/sql-server-data-quality-services-in-sql2012-rc0-part-2/#comments</comments>
		<pubDate>Thu, 05 Jan 2012 09:03:39 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Integration Services]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1150</guid>
		<description><![CDATA[Since my last post on SSDQS I&#8217;ve been in touch with the development team who have raised some suggestions and workarounds to improve performance. This post will focus on that feedback and how effective it is in reducing execution times. SSIS vs DQS Client Cleansing The first bit of feedback was that interactive cleansing through [...]]]></description>
			<content:encoded><![CDATA[<p>Since my last post on SSDQS I&#8217;ve been in touch with the development team who have raised some suggestions and workarounds to improve performance. This post will focus on that feedback and how effective it is in reducing execution times.</p>
<h2>SSIS vs DQS Client Cleansing</h2>
<p>The first bit of feedback was that interactive cleansing through the DQS Client was known to be faster than SSIS interaction &#8211; so my first instinct was to test just how much faster it was &#8211; and I was surprised &#8211; <strong>the speedup was around fivefold</strong>. The below chart shows my results for processing 10,000 rows with 1-5 Domains:</p>
<div id="attachment_1151" class="wp-caption alignnone" style="width: 302px"><a href="http://www.bimonkey.com/wp-content/uploads/2012/01/ssdqs_ssis_performance_3.png"><img class="size-full wp-image-1151" title="Fig 1: SSDQS SSIS vs Client performance" src="http://www.bimonkey.com/wp-content/uploads/2012/01/ssdqs_ssis_performance_3.png" alt="SSDQS SSIS vs Client performance" width="292" height="164" /></a><p class="wp-caption-text">SSDQS SSIS vs Client performance</p></div>
<p>If this scaled, then my 3.9 hour estimate for a 1m row / 10 Domain process would shrink to under an hour. Still not ideal, but getting closer to a production viable speed.</p>
<p>Now, the reason behind this &#8211; as explained by the DQS team &#8211; is that the component sends discrete chunks through to validate (1000 rows at a time as far as I can tell) which the DQS Server then passes back &#8211; which adds overhead and is inefficient for the DQS Server. This is done so that the DQS Cleansing component is not a blocking component. However at this point it&#8217;s not possible as far as I can tell &#8211; to have any control over the size of these chunks.</p>
<h2>Speeding up SSIS processing</h2>
<p>The next bit of feedback was to suggest breaking up the work to improve throughput. There&#8217;s two ways of doing this &#8211; first is to split up the domain processing and second is to break up the data into  chunks and process in parallel. So I tried this by splitting it up the following ways:</p>
<ol>
<li>5 Domains through a single DQS Cleansing Task &#8211; 10,000 rows</li>
<li>Each domain though a dedicated DQS Cleansing Task &#8211; 10,000 rows</li>
<li>5 Domains through 5 dedicated DQS Cleansing Tasks &#8211; 2,000 rows each &#8211; 10,000 total</li>
</ol>
<p>To be honest, the results weren&#8217;t overwhelming:</p>
<ol>
<li>Untuned: 94s</li>
<li>Separated Domains: 86s</li>
<li>Separated Data: 78s</li>
</ol>
<p>Given that the Separating of Domains means the data would in a real situation have to be split up and recombined, there&#8217;s probably not enough saving there to make that approach worthwhile. Splitting up the data yielded a 20% processing time saving &#8211; nice, not enough to be really useful given how long it takes normally.</p>
<h2>Practical suggestions for the DQS Team</h2>
<p>A direct quote from the DQS team&#8217;s mail to me was &#8220;DQS is designed to best perform on large chunks.&#8221;. Looking at the SSIS logs, it&#8217;s only sending 1,000 rows at a time &#8211; which is clearly sub optimal for DQS + SSIS to interact effectively. So there are two options available for a fix based on my understanding:</p>
<ol>
<li>Make the component configurable to send larger chunks &#8211; with a more SSIS like 10,000 rows default</li>
<li>Make the component optionally blocking</li>
</ol>
<p>The first just makes sense and I doubt would be a massive job to make &#8220;Rows To DQS Server&#8221; a configurable property. The second may be harder &#8211; and can probably be duplicated just by setting the new &#8220;Rows to DQS Server&#8221;  property to zero or a very high number.</p>
<p>In practice it&#8217;s still a bit slow for very heavy DW workloads, but hopefully the above suggestion would give it a real boost in performance and make it viable for mid sized ones.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2012/01/sql-server-data-quality-services-in-sql2012-rc0-part-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>BI &amp; DW Australia</title>
		<link>http://www.bimonkey.com/2011/12/bi-dw-australia/</link>
		<comments>http://www.bimonkey.com/2011/12/bi-dw-australia/#comments</comments>
		<pubDate>Wed, 14 Dec 2011 22:38:52 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[Recruitment]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1141</guid>
		<description><![CDATA[Last week I spent some time in the company of the gents from BI &#38; DW Australia, a specialist recruitment firm here in Oz for the BI &#38; DW industry who also run a LinkedIn group focused on &#8230; well, BI &#38; DW in Australia. So, first up &#8211; if you are an Aussie BI person, [...]]]></description>
			<content:encoded><![CDATA[<p>Last week I spent some time in the company of the gents from <a title="BI &amp; DW Australia" href="http://www.bidwaustralia.com/" target="_blank">BI &amp; DW Australia</a>, a specialist recruitment firm here in Oz for the BI &amp; DW industry who also <a title="BI &amp; DW Australia LinkedIn group" href="http://www.linkedin.com/groups?gid=1834854" target="_blank">run a LinkedIn group</a> focused on &#8230; well, BI &amp; DW in Australia. So, first up &#8211; if you are an Aussie BI person, go sign up to the group &#8211; it&#8217;s <em>not</em> a recruitment hub but a valuable localised forum &#8211; a good way to get across some of the &#8220;names&#8221; in the BI Industry here as well as hear about locally relevant topics. This isn&#8217;t a paid plug &#8211; I believe they genuinely add something to the BI community here &#8211; and anyway I think we ended up about even on the bar bills <img src='http://www.bimonkey.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>One of the topics we got on to was niche marketing &#8211; as this is exactly what they have done &#8211; pick a niche, focus on it and get good at it to build a good brand. No different to what Avanade do &#8211; their niche is certain areas of Enterprise IT (which admittedly is a whacking great big niche) but it&#8217;s an area of focus all the same.</p>
<h2>Recruiter Keyword Spam makes me do what?</h2>
<p>If you have no niche or market specialisation then you will be unfocused on your customers or market and this will become apparent when you deal with either. This manifests in recruitment with our friend keyword spam. I have a bit of SAS experience on my CV &#8211; which I last actively used in 2001 and am quite clearly &#8211; based on experience and work history over the last <strong>decade</strong> - a Microsoft / Enterprise BI guy now, and have no interest in SAS jobs. Nor would anyone hire me for one. But still I get spammed by unfocused recruiters who keyword search and send out jobs without accounting for my experience and skills. What impression does this give me? That as a recruiter you don&#8217;t know what you&#8217;re doing either in respect of me or your client. What behaviour does this drive? It drives me to add you to my junk mailer list and thus I will likely never deal with you again.</p>
<p>Conversely &#8211; if you know your market and your customers &#8211; any interaction will be beneficial, or at least likely enough that as a vendor your customer will think you input is at least worth taking the time out to hear. If the BI &amp; DW guys called I&#8217;d answer because I know they would have something interesting to offer. Most recruiters go straight to voicemail and then the trash.</p>
<p>So apologies for a bit of a rant &#8211; but bad marketing annoys me &#8211; particularly when it&#8217;s aimed indiscriminately at me.  In summary &#8211; go join the <a title="BI &amp; DW Australia LinkedIn group" href="http://www.linkedin.com/groups?gid=1834854" target="_blank">BI &amp; DW LinkedIn group</a> - and don&#8217;t offer me SAS jobs.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/12/bi-dw-australia/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SQL Server Data Quality Services in SQL2012 RC0 &#8211; Part 1</title>
		<link>http://www.bimonkey.com/2011/12/sql-server-data-quality-services-in-sql2012-rc0-part-1/</link>
		<comments>http://www.bimonkey.com/2011/12/sql-server-data-quality-services-in-sql2012-rc0-part-1/#comments</comments>
		<pubDate>Wed, 14 Dec 2011 00:55:36 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Integration Services]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1132</guid>
		<description><![CDATA[So the key news &#8211; in case you missed it &#8211; is that SQL2012 RC0 has been made available for download. After a few battles with the Installer &#8211; first the known issue with the Distributed Replay users &#8211; then some things requiring manual installs of KB&#8217;s to get the installer to run through &#8211; [...]]]></description>
			<content:encoded><![CDATA[<p>So the key news &#8211; in case you missed it &#8211; is that <a title="Microsoft® SQL Server® 2012 Release Candidate 0 (RC0)" href="http://www.microsoft.com/download/en/details.aspx?id=28145">SQL2012 RC0 has been made available for download</a>. After a few battles with the Installer &#8211; first <a title="SQL Server 2012 : A couple of notes about installing RC0 " href="http://sqlblog.com/blogs/aaron_bertrand/archive/2011/11/19/sql-server-2012-a-couple-of-notes-about-installing-rc0.aspx">the known issue with the Distributed Replay users</a> &#8211; then some things requiring manual installs of KB&#8217;s to get the installer to run through &#8211; I have a VM set up with it.</p>
<p>The DQS team have <a title="SQL Server 2012 RC0 - What's New in DQS?" href="http://blogs.msdn.com/b/dqs/archive/2011/11/29/sql-server-2012-rc0-what-s-new-in-dqs.aspx">posted about the improvements made in the DQS blog</a> &#8211; and the one I really wanted to focus on was performance via SSIS as the CTP3 offering was not viable for large data sets. So this Part 1 post is all about the performance of DQS via SSIS in RC0.</p>
<p>So, I set up a Knowledge Base in the same way as I did for testing CTP3, with 5 duplicate domains &#8211; just evaluating an Integer with a single rule saying that integer had to be greater than a value to be valid. Then I ran two sets of values (5k &amp; 10k rows) through the KB via SSIS, evaluating 1,2,3,4 and 5 fields.</p>
<h2>So how does DQS Perform?</h2>
<p>Here&#8217;s the results- the value in the grid is Seconds taken to process.</p>
<div id="attachment_1134" class="wp-caption alignnone" style="width: 541px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/12/ssdqs_ssis_performance_2.png"><img class=" wp-image-1134" title="DQS Performance in SSIS" src="http://www.bimonkey.com/wp-content/uploads/2011/12/ssdqs_ssis_performance_2.png" alt="DQS Performance in SSIS" width="531" height="169" /></a><p class="wp-caption-text">DQS Performance in SSIS</p></div>
<p>&nbsp;</p>
<p>So &#8211; have we moved on from CTP3? A bit. But not much, and enough to be accounted for by a different VM setup (as a reminder CTP3 processing 5k rows took from 20 to 45 seconds for 1-5 columns). I accept a VM may be slower than a properly configured server, but even if it was twice as quick it would still not be a viable option for industrial use.</p>
<p>Looking at execution time changes by number of columns / rows processed, the time taken seems to be pretty linear as rows and columns increase, so it appears DQS performance can be evaluated pretty much as:</p>
<blockquote><p><strong>DQS Execution Time</strong> = Spin Up Time + (Columns * (Rows * Row Process Time))</p>
<p>Where:</p>
<p><strong>Spin Up Time</strong> = time taken for DQS engine to start (Constant)</p>
<p><strong>Columns</strong> = number of columns being evaluated (Variable)</p>
<p><strong>Rows</strong> = number of rows being processed (Variable)</p>
<p><strong>Row Process Time</strong> = time taken to process a single row (Constant)</p></blockquote>
<p>On my VM, Spin Up Time seems to be 7 seconds, and Rows Process Time = 0.0014 seconds.</p>
<p>So, if we had to validate 10 columns on 1,000,000 rows of data (not too crazy) -</p>
<blockquote><p><strong>DQS Execution Time</strong> = Spin Up Time + (Columns * (Rows * Row Process Time))</p>
<p>DQS Execution Time = 7 + (10 * (1,000,000 * 0.0014))</p>
<p>DQS Execution Time = 7 + (10 * (1,000,000 * 0.0014))</p>
<p><strong>DQS Execution Time</strong> = 14007 seconds = 233 minutes = <strong>3.9 hours</strong></p></blockquote>
<p>Which effectively rules it out as a viable production process. Note of course that my formula doesn&#8217;t make any allowance for rule complexity.</p>
<h2>Is DQS Production ready?</h2>
<p>As per anything, the answer is &#8211; It depends. For validating small data sets it&#8217;s in the realms of slow, but probably acceptable. For big data sets, I&#8217;d have to say no &#8211; I couldn&#8217;t use it in a production environment to validate large sets of data. I&#8217;ve <a title="Data Quality Services / DQS Cleansing Component performance too slow" href="https://connect.microsoft.com/SQLServer/feedback/details/713837/data-quality-services-dqs-cleansing-component-performance-too-slow">added a Connect suggestion</a> to get this on the teams radar.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/12/sql-server-data-quality-services-in-sql2012-rc0-part-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Microsoft BI and &#8230; SAP?</title>
		<link>http://www.bimonkey.com/2011/11/microsoft-bi-and-sap/</link>
		<comments>http://www.bimonkey.com/2011/11/microsoft-bi-and-sap/#comments</comments>
		<pubDate>Fri, 25 Nov 2011 10:22:48 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[SAP]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1123</guid>
		<description><![CDATA[SAP is without doubt one of the biggest ERP systems in the world. Microsoft has it&#8217;s own offering, Dynamics &#8211; but that&#8217;s by the by. SAP also offers plenty of BI tools as part of it&#8217;s offering &#8211; arguably too many. What&#8217;s less known is it&#8217;s quite possible to overlay Microsoft BI over the top [...]]]></description>
			<content:encoded><![CDATA[<p><a title="SAP" href="http://www.sap.com">SAP</a> is without doubt one of the biggest <a title="Enterprise Resource Planning" href="http://en.wikipedia.org/wiki/Enterprise_resource_planning">ERP</a> systems in the world. Microsoft has it&#8217;s own offering, <a title="Microsoft Dynamics" href="http://www.microsoft.com/en-us/dynamics/default.aspx">Dynamics</a> &#8211; but that&#8217;s by the by.</p>
<p>SAP also offers <a title="SAP BI" href="http://www.sap.com/solutions/sapbusinessobjects/large/business-intelligence/index.epx">plenty of BI tools</a> as part of it&#8217;s offering &#8211; arguably too many. What&#8217;s less known is it&#8217;s quite possible to overlay Microsoft BI over the top of it. After all, an ERP system is still just a source of data, and BI is all about making sense of data. How practical is it? Well, I know that at <a title="Avanade" href="http://www.avanade.com/">Avanade</a> we&#8217;ve done one of the world&#8217;s biggest implementations. And <a title="Microsoft Case Study of MS BI on SAP" href="http://www.microsoft.com/casestudies/Microsoft-SQL-Server-Management-Studio/Sysmex-America-Inc/Life-Sciences-Firm-Enhances-Performance-Management-Saves-Millions-with-Microsoft-BI/4000008275">here is a Microsoft case study</a> where the same has been done.</p>
<p>So the obvious question then becomes &#8211; why bother? And there&#8217;s two very clear drivers for that:</p>
<ul>
<li>User empowerment</li>
<li>Cost</li>
</ul>
<p>To extend on each &#8211; User Empowerment comes from the higher ease of use that MS tools offers, the self service capabilities through PowerView &amp; PowerPivot, SharePoint and our old friend Excel. Which of course then leads to greater adoption and a more successful move to a data driven organisation.</p>
<p>Cost &#8211; well, SAP ain&#8217;t cheap. Even with the costs involved in building an alternative solution (and there are people out there who can give you a head start) &#8211; Microsoft&#8217;s licencing model means that you can reach more users at a lower cost.</p>
<p>So now you know.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/11/microsoft-bi-and-sap/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microsoft Download Manager</title>
		<link>http://www.bimonkey.com/2011/11/microsoft-download-manager/</link>
		<comments>http://www.bimonkey.com/2011/11/microsoft-download-manager/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 21:38:35 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[Download Manager]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1115</guid>
		<description><![CDATA[Sharing a small discovery I made while hunting for a Download Manager to help me pull down a 40 GB VM, Microsoft have their own available here. It&#8217;s small and simple but has the important feature for me: it supports Domain based Windows Authentication, which means I can connect to my corporate domain using my [...]]]></description>
			<content:encoded><![CDATA[<p>Sharing a small discovery I made while hunting for a Download Manager to help me pull down a 40 GB VM, Microsoft have their own <a title="Microsoft Download Manager" href="http://www.microsoft.com/download/en/details.aspx?id=26214">available here</a>.</p>
<p>It&#8217;s small and simple but has the important feature for me: it supports Domain based Windows Authentication, which means I can connect to my corporate domain using my Windows log on and download. Other download managers struggled with this. Plus, no adware.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/11/microsoft-download-manager/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BI Monkey Tweets</title>
		<link>http://www.bimonkey.com/2011/10/bi-monkey-tweets/</link>
		<comments>http://www.bimonkey.com/2011/10/bi-monkey-tweets/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 23:50:50 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[BI Monkey News]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1113</guid>
		<description><![CDATA[I started tweeting at SQL Pass &#8211; @BI_Monkey &#8211; and think I like it! Please follow me if you are a fellow Twitter user &#8211; I will return the favour!]]></description>
			<content:encoded><![CDATA[<p>I started tweeting at SQL Pass &#8211; @BI_Monkey &#8211; and think I like it! Please follow me if you are a fellow Twitter user &#8211; I will return the favour!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/10/bi-monkey-tweets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Pass Day #3</title>
		<link>http://www.bimonkey.com/2011/10/sql-pass-day-3/</link>
		<comments>http://www.bimonkey.com/2011/10/sql-pass-day-3/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 23:46:55 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Hadoop]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1108</guid>
		<description><![CDATA[The third and final day at SQL Pass was presaged by me at the bloggers table (though only able to manically tweet) watching Dr DeWitt&#8217;s keynote &#8211; and I can see why his keynotes are so highly regarded. His subject was Big Data &#8211; and given the potential for this to be a dull and [...]]]></description>
			<content:encoded><![CDATA[<p>The third and final day at SQL Pass was presaged by me at the bloggers table (though only able to manically tweet) watching Dr DeWitt&#8217;s keynote &#8211; and I can see why his keynotes are so highly regarded. His subject was Big Data &#8211; and given the potential for this to be a dull and impenetrable subject area &#8211; he gave a great and illuminating talk on the topic.</p>
<p>Topics that he covered included:</p>
<ul>
<li>ACID vs BASE (i.e the battle between consistency of data vs. availability of data)</li>
<li>NoSQL is a means of querying raw data with no cleansing / structure / ETL</li>
<li>His expectation is that Structured (SQL) and Unstructured (Hadoop) data will coexist in organisations</li>
<li>Hadoop consists of Storage (HDFS) and Process (MapReduce)</li>
<li>MapReduce is too complex to work with so languages such as Hive and Pig sit on top of it</li>
<li>Sqoop is the tool to make Unstructured and Structured data talk &#8211; but performance is not good</li>
</ul>
<p>I can&#8217;t really do his talk justice but now I understand Hadoop a whole lot better &#8211; essentially it&#8217;s just a read only store of unstructured data, a very different beast to a relational database and addressing totally different needs.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/10/sql-pass-day-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Pass Day #2</title>
		<link>http://www.bimonkey.com/2011/10/sql-pass-day-2/</link>
		<comments>http://www.bimonkey.com/2011/10/sql-pass-day-2/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 00:54:17 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Denali]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[SQL Pass 2011]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1104</guid>
		<description><![CDATA[So, on to day 2 of SQL Pass, and a very SSIS focused one &#8211; mainly because I attended Matt Masson&#8217;s SSIS session and learned about a whole new bunch of nice features that have made it in to the next version. I took away the following interesting points from that session: CDC (Change Data [...]]]></description>
			<content:encoded><![CDATA[<p>So, on to day 2 of SQL Pass, and a very SSIS focused one &#8211; mainly because I attended Matt Masson&#8217;s SSIS session and learned about a whole new bunch of nice features that have made it in to the next version.</p>
<p>I took away the following interesting points from that session:</p>
<ol>
<li><strong>CDC (Change Data Capture)</strong> is supported more effectively through some new components &#8211; a CDC Control, CDC Source and CDC Splitter</li>
<li><strong>ODBC </strong>improvements mean improved performance for non SQL Server databases</li>
<li><strong>Connection Managers </strong>get a few new features &#8211; Offline, Expression and Project indicators. Also Offline Connection Managers are picked out through a timeout and importantly now <strong>halt validation</strong> of any related components (so you no longer get those drags as SSIS tries to validate components and flows hooked to dead connections)</li>
<li><strong>File Connection Manager</strong> can now handle variable numbers of columns (i.e. it won&#8217;t crash)</li>
<li><strong>Pivot </strong>gets a UI &#8211; hurrah! (Note: SCD still sucks)</li>
<li>Project <strong>Parameters </strong>can be configured at design time though Visual Studio Configurations</li>
<li><strong>Breakpoints </strong>are now in the Script Component so we can see what data is causing or components to blow up</li>
<li><strong>Data Taps</strong> &#8211; data viewers for live execution that dump out to .csv files</li>
<li>Package <strong>Execution </strong>via PowerShell or even T-SQL!</li>
</ol>
<p>Matt also gave a preview of Barcelona in action, and it looks pretty neat.</p>
<p>I also attended a DQS session that showed a few new features on the UI. Elad Ziklik highlighted that the CTP3 release should be viewed more as a capability preview rather than a test drive of a functional product &#8211; so looking forward to a new CTP.</p>
<p>One more day down &#8211; one to go&#8230;!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/10/sql-pass-day-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Pass Day #1</title>
		<link>http://www.bimonkey.com/2011/10/sql-pass-day-1/</link>
		<comments>http://www.bimonkey.com/2011/10/sql-pass-day-1/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 00:28:36 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Denali]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[SQL Pass 2011]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1091</guid>
		<description><![CDATA[So the BI Monkey was at PASS and intended blogging on his phone or at one of the many internet pods provided, but sadly WordPress and IE / Android aren&#8217;t friends so this is going to be a retrospective. So, here are my takeaways from day 1: From the keynote: Denali becomes SQL2012 and is [...]]]></description>
			<content:encoded><![CDATA[<p>So the BI Monkey was at PASS and intended blogging on his phone or at one of the many internet pods provided, but sadly WordPress and IE / Android aren&#8217;t friends so this is going to be a retrospective.</p>
<p>So, here are my takeaways from day 1:</p>
<p><em>From the keynote:</em></p>
<p>Denali becomes SQL2012 and is slated for release in the first half of next year (way to allow yourself some leeway on the final release date!)</p>
<p>Crescent becomes Power View &#8211; and will work on multiple mobile platforms &#8211; Apple &amp; Android via Browser, WP7 via a richer app</p>
<p>HADOOP is going to be supported in Windows server &#8211; the first CTP is due next year. If you have access to the PASS DVD&#8217;s / Sessions &#8211; see Dr Dewitt&#8217;s presentation on HADOOP &#8211; very enlightening</p>
<p><em>From wandering around the summit:</em></p>
<p>As per Elad Ziklik of the DQS team, performance is one of their focuses for the next release</p>
<p>I got to finally meet Ivan Peev of <a title="CozyRoc" href="http://www.cozyroc.com/">CozyRoc</a> after many years of email and phone conversations, and he told me about their new SAS Connector &#8211; which allows reading and writing to SAS datasets without an actual SAS install.</p>
<p>I also got to meet Matt Masson, guru of the SSIS team who told me about <a title="Project Barcelona Team Blog" href="http://blogs.msdn.com/b/project_barcelona_team_blog/">Project Barcelona</a> &#8211; a tool that will do data lineage, metadata management and impact analysis via a crawler as opposed to a manually maintained set.</p>
<p>I also got to sit in on a customer feedback session about BI in the Cloud &#8211; unfortunately all under NDA &#8211; but it was a great forum to discuss and help direct Microsoft&#8217;s Cloud BI ambitions.</p>
<p>I also had a chat with fellow Aussie BI guy <a title="Roger Noble" href="http://www.rogernoble.com/">Roger Noble</a> who told me about a use for the Term Extraction transformation in SSIS &#8211; using it to scan through documents and auto-tag them as the were uploaded to SharePoint &#8211; which is pretty cool.</p>
<p>So, that was Day 1&#8230; Day 2 to follow!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/10/sql-pass-day-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

