<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BI Monkey &#187; Integration Services</title>
	<atom:link href="http://www.bimonkey.com/category/microsoft-bi/ssis/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.bimonkey.com</link>
	<description>James Beresford on Microsoft BI and Consulting in Sydney, Australia</description>
	<lastBuildDate>Mon, 23 Jan 2012 22:01:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>I&#8217;m presenting at SQL Server User Group on Feb 14th</title>
		<link>http://www.bimonkey.com/2012/01/im-presenting-at-sql-server-user-group-on-feb-14th/</link>
		<comments>http://www.bimonkey.com/2012/01/im-presenting-at-sql-server-user-group-on-feb-14th/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 22:01:56 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Presentations]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1162</guid>
		<description><![CDATA[If you’re in Sydney on Valentine’s day and don’t like your wife / partner all that much, you can come and spend time with something you really love – SQL Server! I’m presenting at the SQL Server User Group on Feb 14th – the session title is “Introducing SQL Server Data Quality Services, plus what’s [...]]]></description>
			<content:encoded><![CDATA[<p>If you’re in Sydney on Valentine’s day and don’t like your wife / partner all that much, you can come and spend time with something you <em>really</em> love – SQL Server! I’m presenting at the SQL Server User Group on Feb 14<sup>th</sup> – the session title is “<strong>Introducing SQL Server Data Quality Services, plus what’s new in SQL2012 SSIS</strong>”. Please register at: <a href="http://www.sqlserver.org.au/Events/ViewEvent.aspx?EventId=577">http://www.sqlserver.org.au/Events/ViewEvent.aspx?EventId=577</a></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2012/01/im-presenting-at-sql-server-user-group-on-feb-14th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Server Data Quality Services in SQL2012 RC0 &#8211; Part 2</title>
		<link>http://www.bimonkey.com/2012/01/sql-server-data-quality-services-in-sql2012-rc0-part-2/</link>
		<comments>http://www.bimonkey.com/2012/01/sql-server-data-quality-services-in-sql2012-rc0-part-2/#comments</comments>
		<pubDate>Thu, 05 Jan 2012 09:03:39 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Integration Services]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1150</guid>
		<description><![CDATA[Since my last post on SSDQS I&#8217;ve been in touch with the development team who have raised some suggestions and workarounds to improve performance. This post will focus on that feedback and how effective it is in reducing execution times. SSIS vs DQS Client Cleansing The first bit of feedback was that interactive cleansing through [...]]]></description>
			<content:encoded><![CDATA[<p>Since my last post on SSDQS I&#8217;ve been in touch with the development team who have raised some suggestions and workarounds to improve performance. This post will focus on that feedback and how effective it is in reducing execution times.</p>
<h2>SSIS vs DQS Client Cleansing</h2>
<p>The first bit of feedback was that interactive cleansing through the DQS Client was known to be faster than SSIS interaction &#8211; so my first instinct was to test just how much faster it was &#8211; and I was surprised &#8211; <strong>the speedup was around fivefold</strong>. The below chart shows my results for processing 10,000 rows with 1-5 Domains:</p>
<div id="attachment_1151" class="wp-caption alignnone" style="width: 302px"><a href="http://www.bimonkey.com/wp-content/uploads/2012/01/ssdqs_ssis_performance_3.png"><img class="size-full wp-image-1151" title="Fig 1: SSDQS SSIS vs Client performance" src="http://www.bimonkey.com/wp-content/uploads/2012/01/ssdqs_ssis_performance_3.png" alt="SSDQS SSIS vs Client performance" width="292" height="164" /></a><p class="wp-caption-text">SSDQS SSIS vs Client performance</p></div>
<p>If this scaled, then my 3.9 hour estimate for a 1m row / 10 Domain process would shrink to under an hour. Still not ideal, but getting closer to a production viable speed.</p>
<p>Now, the reason behind this &#8211; as explained by the DQS team &#8211; is that the component sends discrete chunks through to validate (1000 rows at a time as far as I can tell) which the DQS Server then passes back &#8211; which adds overhead and is inefficient for the DQS Server. This is done so that the DQS Cleansing component is not a blocking component. However at this point it&#8217;s not possible as far as I can tell &#8211; to have any control over the size of these chunks.</p>
<h2>Speeding up SSIS processing</h2>
<p>The next bit of feedback was to suggest breaking up the work to improve throughput. There&#8217;s two ways of doing this &#8211; first is to split up the domain processing and second is to break up the data into  chunks and process in parallel. So I tried this by splitting it up the following ways:</p>
<ol>
<li>5 Domains through a single DQS Cleansing Task &#8211; 10,000 rows</li>
<li>Each domain though a dedicated DQS Cleansing Task &#8211; 10,000 rows</li>
<li>5 Domains through 5 dedicated DQS Cleansing Tasks &#8211; 2,000 rows each &#8211; 10,000 total</li>
</ol>
<p>To be honest, the results weren&#8217;t overwhelming:</p>
<ol>
<li>Untuned: 94s</li>
<li>Separated Domains: 86s</li>
<li>Separated Data: 78s</li>
</ol>
<p>Given that the Separating of Domains means the data would in a real situation have to be split up and recombined, there&#8217;s probably not enough saving there to make that approach worthwhile. Splitting up the data yielded a 20% processing time saving &#8211; nice, not enough to be really useful given how long it takes normally.</p>
<h2>Practical suggestions for the DQS Team</h2>
<p>A direct quote from the DQS team&#8217;s mail to me was &#8220;DQS is designed to best perform on large chunks.&#8221;. Looking at the SSIS logs, it&#8217;s only sending 1,000 rows at a time &#8211; which is clearly sub optimal for DQS + SSIS to interact effectively. So there are two options available for a fix based on my understanding:</p>
<ol>
<li>Make the component configurable to send larger chunks &#8211; with a more SSIS like 10,000 rows default</li>
<li>Make the component optionally blocking</li>
</ol>
<p>The first just makes sense and I doubt would be a massive job to make &#8220;Rows To DQS Server&#8221; a configurable property. The second may be harder &#8211; and can probably be duplicated just by setting the new &#8220;Rows to DQS Server&#8221;  property to zero or a very high number.</p>
<p>In practice it&#8217;s still a bit slow for very heavy DW workloads, but hopefully the above suggestion would give it a real boost in performance and make it viable for mid sized ones.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2012/01/sql-server-data-quality-services-in-sql2012-rc0-part-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>SQL Server Data Quality Services in SQL2012 RC0 &#8211; Part 1</title>
		<link>http://www.bimonkey.com/2011/12/sql-server-data-quality-services-in-sql2012-rc0-part-1/</link>
		<comments>http://www.bimonkey.com/2011/12/sql-server-data-quality-services-in-sql2012-rc0-part-1/#comments</comments>
		<pubDate>Wed, 14 Dec 2011 00:55:36 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Integration Services]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1132</guid>
		<description><![CDATA[So the key news &#8211; in case you missed it &#8211; is that SQL2012 RC0 has been made available for download. After a few battles with the Installer &#8211; first the known issue with the Distributed Replay users &#8211; then some things requiring manual installs of KB&#8217;s to get the installer to run through &#8211; [...]]]></description>
			<content:encoded><![CDATA[<p>So the key news &#8211; in case you missed it &#8211; is that <a title="Microsoft® SQL Server® 2012 Release Candidate 0 (RC0)" href="http://www.microsoft.com/download/en/details.aspx?id=28145">SQL2012 RC0 has been made available for download</a>. After a few battles with the Installer &#8211; first <a title="SQL Server 2012 : A couple of notes about installing RC0 " href="http://sqlblog.com/blogs/aaron_bertrand/archive/2011/11/19/sql-server-2012-a-couple-of-notes-about-installing-rc0.aspx">the known issue with the Distributed Replay users</a> &#8211; then some things requiring manual installs of KB&#8217;s to get the installer to run through &#8211; I have a VM set up with it.</p>
<p>The DQS team have <a title="SQL Server 2012 RC0 - What's New in DQS?" href="http://blogs.msdn.com/b/dqs/archive/2011/11/29/sql-server-2012-rc0-what-s-new-in-dqs.aspx">posted about the improvements made in the DQS blog</a> &#8211; and the one I really wanted to focus on was performance via SSIS as the CTP3 offering was not viable for large data sets. So this Part 1 post is all about the performance of DQS via SSIS in RC0.</p>
<p>So, I set up a Knowledge Base in the same way as I did for testing CTP3, with 5 duplicate domains &#8211; just evaluating an Integer with a single rule saying that integer had to be greater than a value to be valid. Then I ran two sets of values (5k &amp; 10k rows) through the KB via SSIS, evaluating 1,2,3,4 and 5 fields.</p>
<h2>So how does DQS Perform?</h2>
<p>Here&#8217;s the results- the value in the grid is Seconds taken to process.</p>
<div id="attachment_1134" class="wp-caption alignnone" style="width: 541px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/12/ssdqs_ssis_performance_2.png"><img class=" wp-image-1134" title="DQS Performance in SSIS" src="http://www.bimonkey.com/wp-content/uploads/2011/12/ssdqs_ssis_performance_2.png" alt="DQS Performance in SSIS" width="531" height="169" /></a><p class="wp-caption-text">DQS Performance in SSIS</p></div>
<p>&nbsp;</p>
<p>So &#8211; have we moved on from CTP3? A bit. But not much, and enough to be accounted for by a different VM setup (as a reminder CTP3 processing 5k rows took from 20 to 45 seconds for 1-5 columns). I accept a VM may be slower than a properly configured server, but even if it was twice as quick it would still not be a viable option for industrial use.</p>
<p>Looking at execution time changes by number of columns / rows processed, the time taken seems to be pretty linear as rows and columns increase, so it appears DQS performance can be evaluated pretty much as:</p>
<blockquote><p><strong>DQS Execution Time</strong> = Spin Up Time + (Columns * (Rows * Row Process Time))</p>
<p>Where:</p>
<p><strong>Spin Up Time</strong> = time taken for DQS engine to start (Constant)</p>
<p><strong>Columns</strong> = number of columns being evaluated (Variable)</p>
<p><strong>Rows</strong> = number of rows being processed (Variable)</p>
<p><strong>Row Process Time</strong> = time taken to process a single row (Constant)</p></blockquote>
<p>On my VM, Spin Up Time seems to be 7 seconds, and Rows Process Time = 0.0014 seconds.</p>
<p>So, if we had to validate 10 columns on 1,000,000 rows of data (not too crazy) -</p>
<blockquote><p><strong>DQS Execution Time</strong> = Spin Up Time + (Columns * (Rows * Row Process Time))</p>
<p>DQS Execution Time = 7 + (10 * (1,000,000 * 0.0014))</p>
<p>DQS Execution Time = 7 + (10 * (1,000,000 * 0.0014))</p>
<p><strong>DQS Execution Time</strong> = 14007 seconds = 233 minutes = <strong>3.9 hours</strong></p></blockquote>
<p>Which effectively rules it out as a viable production process. Note of course that my formula doesn&#8217;t make any allowance for rule complexity.</p>
<h2>Is DQS Production ready?</h2>
<p>As per anything, the answer is &#8211; It depends. For validating small data sets it&#8217;s in the realms of slow, but probably acceptable. For big data sets, I&#8217;d have to say no &#8211; I couldn&#8217;t use it in a production environment to validate large sets of data. I&#8217;ve <a title="Data Quality Services / DQS Cleansing Component performance too slow" href="https://connect.microsoft.com/SQLServer/feedback/details/713837/data-quality-services-dqs-cleansing-component-performance-too-slow">added a Connect suggestion</a> to get this on the teams radar.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/12/sql-server-data-quality-services-in-sql2012-rc0-part-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Simple Data Quality Scoring with SSDQS &amp; SSIS</title>
		<link>http://www.bimonkey.com/2011/10/simple-data-quality-scoring-with-ssdqs-ssis/</link>
		<comments>http://www.bimonkey.com/2011/10/simple-data-quality-scoring-with-ssdqs-ssis/#comments</comments>
		<pubDate>Tue, 04 Oct 2011 23:48:12 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Denali]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1072</guid>
		<description><![CDATA[A common requirement in Data Warehousing is to apply a Data Quality &#8220;score&#8221; to records as they come in. The score is then used to identify and filter or fix bad data coming in depending on its assigned quality. A practical example of this might be that in a Customer Address record, a missing Postcode [...]]]></description>
			<content:encoded><![CDATA[<p>A common requirement in Data Warehousing is to apply a Data Quality &#8220;score&#8221; to records as they come in. The score is then used to identify and filter or fix bad data coming in depending on its assigned quality.</p>
<p>A practical example of this might be that in a Customer Address record, a missing Postcode might attract a high score as it&#8217;s a very important field. However a badly formatted work, home or mobile telephone number may attract a lower score as it may not be as important to the business. Though, cumulatively, if all three numbers are badly formatted that may be necessary to give a combined high score so the record gets examined.</p>
<p>An example of this is below. A failed Postcode gets a score of 3, and a failed telephone number gets a score of 1. Thus, anything with a score of 3 or above either has a failed Postcode, or 3 failed telephone numbers, and can thus be subject to special handling.</p>
<div id="attachment_1078" class="wp-caption alignnone" style="width: 586px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/09/ssdqs_ssis_scoring_1.png"><img class="size-full wp-image-1078" title="Data Quality Scoring Example" src="http://www.bimonkey.com/wp-content/uploads/2011/09/ssdqs_ssis_scoring_1.png" alt="Data Quality Scoring Example" width="576" height="136" /></a><p class="wp-caption-text">Fig 1: Data Quality Scoring Example</p></div>
<p>From the example above we can see this is a fairly arbitrary process in terms of how scores are calculated and used. SSDQS itself doesn&#8217;t natively support assigning a score or weight to a failed data item, but what it does do is provide us with a flexible engine to help us decide what is a failed data item. SSIS can then react to this pass / fail behaviour and apply a scoring.</p>
<h2>Setting up an SSDQS Knowledge Base for Scoring</h2>
<p>Given that the basis for scoring is pretty binary in nature, I set up a simple KB that had domains that would either pass or fail a piece of data. I first created a data set with three data fields:</p>
<ul>
<li>Year &#8211; Values ranging from 1970 to 2025</li>
<li>Value &#8211; Values ranging from 0 to 100</li>
<li>Code &#8211; Values A,B,C,D,E</li>
</ul>
<p>I then set up a KB to evaluate the fields as follows:</p>
<ul>
<li>Year &#8211; Valid from 1975 to 2020</li>
<li>Value &#8211; Valid from 10 to 95</li>
<li>Code &#8211; Valid values A,C,E</li>
</ul>
<p>Note that I did not set up any Domain Values or do any training &#8211; I just set up the KB, Domains and Domain Rules. All I want to use DQS for is to identify records that are invalid for SSIS to use in scoring.</p>
<h2>Using SSIS to Score SSDQS output</h2>
<p>Next I hookup up my SSIS Data Quality Cleansing Component to push the source data through the Knowledge Base, and get the status of each of the columns after they pass through. As there are no preloaded valid values in the Domains, the status comes back as either &#8220;Invalid&#8221; (it failed the Domain rule) or &#8220;Unknown&#8221; (in this configuration, this translates to a correct value).</p>
<div id="attachment_1081" class="wp-caption alignnone" style="width: 624px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/10/ssdqs_ssis_scoring_2.png"><img class="size-full wp-image-1081" title="DQS output Data Viewer" src="http://www.bimonkey.com/wp-content/uploads/2011/10/ssdqs_ssis_scoring_2.png" alt="DQS output Data Viewer" width="614" height="243" /></a><p class="wp-caption-text">DQS output Data Viewer</p></div>
<p>The Data Quality Cleansing Component doesn&#8217;t support scoring in itself. This has to be added using a Derived Column on an item by item basis. Using a simple IF / THEN / ELSE expression, I assign a score of 1 to each failed column based on the status of the record, as below:</p>
<div id="attachment_1082" class="wp-caption alignnone" style="width: 509px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/10/ssdqs_ssis_scoring_3.png"><img class="size-full wp-image-1082" title="Applying Score using a Derived Column" src="http://www.bimonkey.com/wp-content/uploads/2011/10/ssdqs_ssis_scoring_3.png" alt="Applying Score using a Derived Column" width="499" height="107" /></a><p class="wp-caption-text">Applying Score using a Derived Column</p></div>
<p>Because of the Pipeline nature of SSIS, I then need to add a second Derived Column transform downstream to weight and add the scores together to create a final, record level score:</p>
<div id="attachment_1083" class="wp-caption alignnone" style="width: 620px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/10/ssdqs_ssis_scoring_4.png"><img class="size-full wp-image-1083" title="Aggregating and Weighting Score using a Derived Column" src="http://www.bimonkey.com/wp-content/uploads/2011/10/ssdqs_ssis_scoring_4.png" alt="Aggregating and Weighting Score using a Derived Column" width="610" height="57" /></a><p class="wp-caption-text">Aggregating and Weighting Score using a Derived Column</p></div>
<p>This results in a final Data Quality &#8220;Score&#8221; assigned to each record:</p>
<div id="attachment_1084" class="wp-caption alignnone" style="width: 375px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/10/ssdqs_ssis_scoring_5.png"><img class="size-full wp-image-1084" title="Weighted Scoring output Data Viewer" src="http://www.bimonkey.com/wp-content/uploads/2011/10/ssdqs_ssis_scoring_5.png" alt="Weighted Scoring output Data Viewer" width="365" height="251" /></a><p class="wp-caption-text">Weighted Scoring output Data Viewer</p></div>
<p>What you then do with these scores is up to you. In my example package, I used a Conditional Split to send records with a score over a certain threshold to a different destination:</p>
<div id="attachment_1085" class="wp-caption alignnone" style="width: 310px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/10/ssdqs_ssis_scoring_6.png"><img class="size-medium wp-image-1085" title="DQS Scoring example Data Flow" src="http://www.bimonkey.com/wp-content/uploads/2011/10/ssdqs_ssis_scoring_6-300x244.png" alt="DQS Scoring example Data Flow" width="300" height="244" /></a><p class="wp-caption-text">DQS Scoring example Data Flow</p></div>
<h2>Improving the Scoring process</h2>
<p>The example I&#8217;ve created is quite simplistic &#8211; it has hard coded weightings and redirection thresholds, and can only react to two (of a possible three) record statuses.  The process could be made more flexible using metadata driven weightings and thresholds (provided as package inputs).</p>
<p>Beyond that you have the option to handle the clean and dirty data more appropriately &#8211; by pushing dirty data into a cleanup process, halting ETL processes etc, etc.</p>
<p>The key takeaway here is that DQS enables you to create a scoring process that is independent of the actual Data Quality rules that pass or fail a piece of data. The DQS Knowledge Base is your flexible input of what qualifies as a good or bad record, instead of having to hard code using SQL or Derived Columns, which could get messy very quickly.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/10/simple-data-quality-scoring-with-ssdqs-ssis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using .NET Framework 3.5 in SSIS Scripts</title>
		<link>http://www.bimonkey.com/2011/09/using-net-framework-3-5-in-ssis-scripts/</link>
		<comments>http://www.bimonkey.com/2011/09/using-net-framework-3-5-in-ssis-scripts/#comments</comments>
		<pubDate>Fri, 23 Sep 2011 00:16:23 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Script Task]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1067</guid>
		<description><![CDATA[Thanks to Valentino Vranken who provides this useful post: Using A .Net 3.5 Assembly In SSIS 2008 By default SSIS Script Tasks / Components reference .NET Framework 2.0, which was confusing me as I tried to implement a solution around managing times in different Time Zones which required the TimeZoneInfo Class which only exists in [...]]]></description>
			<content:encoded><![CDATA[<p>Thanks to Valentino Vranken who provides this useful post: <a title="Using A 3.5 Assembly In SSIS 2008" href="http://blog.hoegaerden.be/2011/05/07/using-a-3-5-assembly-in-ssis-2008/">Using A .Net 3.5 Assembly In SSIS 2008</a></p>
<p>By default SSIS Script Tasks / Components reference .NET Framework 2.0, which was confusing me as I tried to implement a solution around managing times in different Time Zones which required the TimeZoneInfo Class which only exists in 3.5 and higher. (This was what I was trying to implement: <a title="DayLight Saving in Derived Column" href="http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/76a58fee-471c-436d-bea4-c4698df5a4cf/">Daylight Savings in Script Task</a>). Because the script referenced .NET Framework 2.0 the TimeZoneInfo class wouldn&#8217;t work. Switch the target framework to 3.5 and bingo &#8211; all good.</p>
<p>The tl/dr version: In your script component &#8211; while editing the script &#8211; open the Properties and change the Target Framework to 3.5 to use functions available in 3.5.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/09/using-net-framework-3-5-in-ssis-scripts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OLE DB is dead &#8211; Long live ODBC!</title>
		<link>http://www.bimonkey.com/2011/09/ole-db-is-dead-long-live-odbc/</link>
		<comments>http://www.bimonkey.com/2011/09/ole-db-is-dead-long-live-odbc/#comments</comments>
		<pubDate>Mon, 19 Sep 2011 04:21:53 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[ODBC]]></category>
		<category><![CDATA[OLE DB]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1055</guid>
		<description><![CDATA[Something I picked up in passing at the MSDN SSIS forum &#8211; SQL Native Client OLE DB is being deprecated post Denali. See this post on the SQL Native Client Team Blog. What does this mean? First up this only applies to SQL Server connections &#8211; other OLE DB connectivity such as Teradata and Oracle [...]]]></description>
			<content:encoded><![CDATA[<p>Something I picked up in passing at the <a title="SQL Server Integration Services" href="http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/threads">MSDN SSIS forum</a> &#8211; SQL Native Client OLE DB is being deprecated post Denali. See <a title="Microsoft is Aligning with ODBC for Native Relational Data Access" href="http://blogs.msdn.com/b/sqlnativeclient/archive/2011/08/29/microsoft-is-aligning-with-odbc-for-native-relational-data-access.aspx">this post on the SQL Native Client Team Blog</a>.</p>
<p>What does this mean?</p>
<ol>
<li>First up this only applies to SQL Server connections &#8211; other OLE DB connectivity such as Teradata and Oracle will be unaffected.</li>
<li>In the short term, nothing &#8211; if you are using OLE DB it will carry on working for a fair few years yet &#8211; but if you are planning for a systems longevity &#8211; look to using ODBC for all your SQL Server connectivity needs</li>
</ol>
<p>It&#8217;s a slightly confusing move given that ODBC underperforms OLE DB &#8211; so it&#8217;s bad news for us SSIS people, unless part of the change also includes some significant performance and capability improvements.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/09/ole-db-is-dead-long-live-odbc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Server Data Quality Services &amp; SSIS &#8211; Performance</title>
		<link>http://www.bimonkey.com/2011/09/sql-server-data-quality-services-ssis-performance/</link>
		<comments>http://www.bimonkey.com/2011/09/sql-server-data-quality-services-ssis-performance/#comments</comments>
		<pubDate>Wed, 14 Sep 2011 04:26:41 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Denali]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1051</guid>
		<description><![CDATA[This is a snippet of a post on the performance of the DQS engine when called from SSIS. I&#8217;ve created a simple number based Domain rule and replicated it 5 times in my knowledge base. My package then feeds copies of the same set of data into the DQS component (5000 rows) and runs it [...]]]></description>
			<content:encoded><![CDATA[<p>This is a snippet of a post on the performance of the DQS engine when called from SSIS. I&#8217;ve created a simple number based Domain rule and replicated it 5 times in my knowledge base. My package then feeds copies of the same set of data into the DQS component (5000 rows) and runs it through 1 &#8211; 5 domains.</p>
<p>The performance profile is as below:</p>
<div id="attachment_1052" class="wp-caption alignnone" style="width: 435px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/09/ssdqs_ssis_performance_1.png"><img class="size-full wp-image-1052" title="SSIS DQS Component Performance" src="http://www.bimonkey.com/wp-content/uploads/2011/09/ssdqs_ssis_performance_1.png" alt="SSIS DQS Component Performance" width="425" height="289" /></a><p class="wp-caption-text">SSIS DQS Component Performance</p></div>
<p>There seems to be a fairly linear relationship between the number of domains being processed and execution time. Note that I&#8217;ve created a dummy value for &#8220;0&#8243; to indicate what the start-up time of the DQS component might be, as it&#8217;s impossible to have a DQS Cleansing Component in the flow with no columns mapped.</p>
<p>I&#8217;d ignore the actual numbers &#8211; this is on a development VM which is definitely not configured for performance &#8211; and I&#8217;m aware the DQS Team are working on performance issues (though by the looks of it, better be working hard).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/09/sql-server-data-quality-services-ssis-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Server Data Quality Services &amp; SSIS</title>
		<link>http://www.bimonkey.com/2011/09/sql-server-data-quality-services-ssis/</link>
		<comments>http://www.bimonkey.com/2011/09/sql-server-data-quality-services-ssis/#comments</comments>
		<pubDate>Mon, 05 Sep 2011 02:34:42 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data Quality Services]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Denali]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=1037</guid>
		<description><![CDATA[So far in my posts on SSDQS we&#8217;ve looked at the Data Quality Services Client and building SSDQS Knowledge Bases. Now in practice when handling bulk data a need to reference this in routine loads is needed, and to nobody&#8217;s surprise, SSIS is the tool for the job. The DQS Cleansing Component So, in our [...]]]></description>
			<content:encoded><![CDATA[<p>So far in my posts on SSDQS we&#8217;ve looked at the Data Quality Services Client and building SSDQS Knowledge Bases. Now in practice when handling bulk data a need to reference this in routine loads is needed, and to nobody&#8217;s surprise, SSIS is the tool for the job.</p>
<h2>The DQS Cleansing Component</h2>
<p>So, in our (shiny, new) SSIS Toolbox we have a new component to connect to DQS &#8211; the DQS Cleansing Component:</p>
<div id="attachment_1040" class="wp-caption alignnone" style="width: 448px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/08/ssdqs_ssis_1_dqs_cleansing_component.png"><img class="size-full wp-image-1040" title="SSIS DQS Cleansing Component" src="http://www.bimonkey.com/wp-content/uploads/2011/08/ssdqs_ssis_1_dqs_cleansing_component.png" alt="SSIS DQS Cleansing Component" width="438" height="84" /></a><p class="wp-caption-text">SSIS DQS Cleansing Component</p></div>
<p>The DQS cleansing component pushes a data flow to the DQS Engine for validation. This requires a special Connection Manager, the DQS Cleansing Connection Manager, which as we can see below is a simple creature:</p>
<div id="attachment_1042" class="wp-caption alignnone" style="width: 310px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/08/ssdqs_ssis_2_dqs_cleansing_connection_manager.png"><img class="size-medium wp-image-1042" title="SSIS DQS Cleansing Connection Manager" src="http://www.bimonkey.com/wp-content/uploads/2011/08/ssdqs_ssis_2_dqs_cleansing_connection_manager-300x210.png" alt="SSIS DQS Cleansing Connection Manager" width="300" height="210" /></a><p class="wp-caption-text">SSIS DQS Cleansing Connection Manager</p></div>
<p>The sole option at this point is to choose which DQS Server to point at. So, lets look at what we get in the SSIS Component once we use the Connection Manager:</p>
<div id="attachment_1043" class="wp-caption alignnone" style="width: 310px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/08/ssdqs_ssis_3_dqs_cleansing_component.png"><img class="size-medium wp-image-1043" title="SSIS DQS Cleansing Component Connection Manager options" src="http://www.bimonkey.com/wp-content/uploads/2011/08/ssdqs_ssis_3_dqs_cleansing_component-300x153.png" alt="SSIS DQS Cleansing Component Connection Manager options" width="300" height="153" /></a><p class="wp-caption-text">SSIS DQS Cleansing Component Connection Manager options</p></div>
<p>Once again &#8211; still nice and simple &#8211; choosing your Connection Manager allows you to then pick from a list of Published Knowledge Bases. Once a KB is selected, a list of the available Domains is populated, though there is nothing you can do with this list other than review it. So next we move to the Mapping tab:</p>
<div id="attachment_1044" class="wp-caption alignnone" style="width: 310px"><a href="http://www.bimonkey.com/wp-content/uploads/2011/08/ssdqs_ssis_4_dqs_cleansing_component_mapping.png"><img class="size-medium wp-image-1044" title="SSIS DQS Cleansing Component Mapping Tab" src="http://www.bimonkey.com/wp-content/uploads/2011/08/ssdqs_ssis_4_dqs_cleansing_component_mapping-300x172.png" alt="SSIS DQS Cleansing Component Mapping Tab" width="300" height="172" /></a><p class="wp-caption-text">SSIS DQS Cleansing Component Mapping Tab</p></div>
<p>The usual suspects are there &#8211; pick your input columns in the top half of the tab and they become available for mapping in the lower half. Each input column can be mapped to a single Domain (I can&#8217;t quite see how Composite Domains work in this context). You then get three output streams &#8211; the Output, Corrected Output and Status Output. The Output is just the column passed through, Corrected is the column value corrected by the DQS Engine and the Status is the record status (which comes out as Correct, Corrected or Unknown which corresponds to the DQS Data Quality Project statuses. In the Advanced Editor you can also switch on Confidence and and Reason Outputs<strong>, </strong>which relate to matching projects.</p>
<p>Note that there is only a single output for the DQS Cleansing Component &#8211; if you want to send OK, Error and Invalid records to different locations, you will need to do so with a downstream Conditional Split component.</p>
<h2>Summary</h2>
<p>So we&#8217;ve had a quick look at the basics of automating DQS activities using SSIS, and how SSIS plugs in to the DQS Server. Subsequent posts will start digging into some practical implementation including performance.</p>
<p>Some further reading can be found here:</p>
<ul>
<li><a title="Using the SSIS DQS Cleansing Component" href="http://blogs.msdn.com/b/dqs/archive/2011/07/18/using-the-ssis-dqs-cleansing-component.aspx">Using the SSIS DQS Cleansing Component</a> &#8211; from the DQS Blog</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/09/sql-server-data-quality-services-ssis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Connect Improvements on Pivot and SCD denied&#8230;</title>
		<link>http://www.bimonkey.com/2011/04/connect-improvements-on-pivot-and-scd-denied/</link>
		<comments>http://www.bimonkey.com/2011/04/connect-improvements-on-pivot-and-scd-denied/#comments</comments>
		<pubDate>Wed, 20 Apr 2011 00:08:52 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Pivot]]></category>
		<category><![CDATA[Slowly Changing Dimension]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=896</guid>
		<description><![CDATA[It appears that our attempts to fix a couple of the less usable components in SSIS has been canned, once again. See the following links for the generic &#8220;Won&#8217;t Fix&#8221; response on putting a UI on the Pivot and making the SCD perform better than say, a dead possum. Whilst this is slightly disappointing, fortunately [...]]]></description>
			<content:encoded><![CDATA[<p>It appears that our attempts to fix a couple of the less usable components in SSIS has been canned, once again. See the following links for the generic &#8220;Won&#8217;t Fix&#8221; response on <a title="SSIS Pivot Transformation - UI Needed " href="http://connect.microsoft.com/SQLServer/feedback/details/632051/ssis-pivot-transformation-ui-needed">putting a UI on the Pivot</a> and <a title="SSIS Slowly Changing Dimension (SCD) component performance unusably slow " href="http://connect.microsoft.com/SQLServer/feedback/details/632052/ssis-slowly-changing-dimension-scd-component-performance-unusably-slow">making the SCD perform</a> better than say, a dead possum.</p>
<p>Whilst this is slightly disappointing, fortunately for those who like the <a title="SSIS Dimension Merge SCD Component" href="http://dimensionmergescd.codeplex.com/">Dimension Merge</a> (formerly Kimball SCD) component, Pragmatic Works have come to an arrangement with <a title="Todd McDermid's Blog" href="http://toddmcdermid.blogspot.com/">Todd McDermid</a> to include a supported version of it available in a future release of their <a title="TaskFactory" href="http://pragmaticworks.com/products/business-intelligence/taskfactory/">TaskFactory</a> bundle of SSIS components. This means for those who live in risk-averse enterprises, you can now use the component as it comes with a support package.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2011/04/connect-improvements-on-pivot-and-scd-denied/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SSIS Components in Denali</title>
		<link>http://www.bimonkey.com/2010/12/ssis-components-in-denali/</link>
		<comments>http://www.bimonkey.com/2010/12/ssis-components-in-denali/#comments</comments>
		<pubDate>Thu, 16 Dec 2010 23:42:01 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Denali]]></category>
		<category><![CDATA[Pivot]]></category>
		<category><![CDATA[Slowly Changing Dimension]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=848</guid>
		<description><![CDATA[I&#8217;ve finally had a chance to boot up SSIS in Denali and play with it myself. In this post i&#8217;m going to look at some of the things that have - and surprisingly &#8211; have not appeared&#8230;  The Components I&#8217;ve had a scan through the Tasks in the Control Flow and Components in the Data Flow to see what&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve finally had a chance to boot up SSIS in Denali and play with it myself. In this post i&#8217;m going to look at some of the things that have - and surprisingly &#8211; have <strong>not</strong> appeared&#8230; </p>
<h2>The Components</h2>
<p>I&#8217;ve had a scan through the Tasks in the Control Flow and Components in the Data Flow to see what&#8217;s there. First observation is that all the components have got funky new icons. I don&#8217;t see anything new function-wise on the control flow, but it&#8217;s interesting to note the<strong> Execute DTS 2000 Package</strong> task remains, implying this piece of backwards compatibility will still remain. </p>
<div class="wp-caption alignnone" style="width: 205px"><img title="SSIS Data Correction Component" src="http://www.bimonkey.com/uploads/ssis/DataCorrectionComponent.jpg" alt="SSIS Data Correction Component" width="195" height="69" /><p class="wp-caption-text">SSIS Data Correction Component</p></div>
<p> </p>
<p>There&#8217;s only one completely new item available on the Data Flow, which is the <strong>Data Correction</strong> component. However as per <a title="http://bifuture.blogspot.com/2010/11/ssis-data-correction-component-denali.html" href="http://bifuture.blogspot.com/2010/11/ssis-data-correction-component-denali.html">this blog post</a> by <a title="BI Future blog" href="http://bifuture.blogspot.com/">Hennie de Nooijer</a> on his BI Future blog, it appears to be unusable. It looks like this is where SSIS will touch upon the Data Quality Services component of SQL Server (SSDQS), but since it has no Advanced Editor available and it can&#8217;t be configured by the normal editor there&#8217;s not much I can find out about how this component works. One for review in a future CTP. </p>
<p>There&#8217;s a sort of new, sort of not, item in the form of the Source and Destination assistants, which are effectively Wizards for building Source and Destination  adapters. It tidies up the toolbox a bit but other than that doesn&#8217;t seem to add much for experienced users. </p>
<h2>Fix these problems!</h2>
<p>However, there are things that desperately needed to change that haven&#8217;t. The <strong>Slowly Changing Dimension </strong>component has had no visible changes, leaving us with the poorly performing (usually to the point of unusability) RBAR toy we have already (<a title="Improve SCD task performance (like using TableDifference)" href="https://connect.microsoft.com/SQLServer/feedback/details/177746/improve-scd-task-performance-like-using-tabledifference">a previous suggestion on Connect was killed in 07</a>). The <strong>Pivot</strong> Transformation still has no usable interface (a <a title="Pivot UI in SSIS" href="https://connect.microsoft.com/SQLServer/feedback/details/127104/pivot-ui-in-ssis">previous suggestion on Connect was killed in 06</a>) and remains confusing and difficult to configure. The <strong>Derived Column</strong> transform still locks you into Unicode strings if you have a text field, and overrides your setting if you change it, then alter the expression, and there also appears to be no new functionality in terms of added functions (<a title="Extend function library and allow User Defined functions for SSIS Derived Column Transformation" href="https://connect.microsoft.com/SQLServer/feedback/details/471287/extend-function-library-and-allow-user-defined-functions-for-ssis-derived-column-transformation">my request to make this extensible on connect has been killed</a>). </p>
<p>There generally seems to have been a lack of updates that address some of the existing issues with individual components &#8211; given that some of these &#8211; especially the SCD and Pivot &#8211; have been poor since day 1 &#8211; now 5 years ago &#8211; it&#8217;s a shame to have seen no changes at all. </p>
<p>To at least get the Pivot and SCD fixed I&#8217;m raising new connect items for them &#8211; <strong>please vote for them</strong> by following the links below and we may get them fixed in a future CTP: </p>
<ul>
<li><a title="SSIS Pivot Transformation - UI Needed" href="https://connect.microsoft.com/SQLServer/feedback/details/632051/ssis-pivot-transformation-ui-needed">Request for a decent UI for the Pivot</a></li>
<li><a title="SSIS Slowly Changing Dimension (SCD) component performance unusably slow" href="https://connect.microsoft.com/SQLServer/feedback/details/632052/ssis-slowly-changing-dimension-scd-component-performance-unusably-slow">Suggestions to improve the SCD component</a></li>
</ul>
<p>Thanks in advance!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/12/ssis-components-in-denali/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

