<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BI Monkey &#187; Microsoft BI</title>
	<atom:link href="http://www.bimonkey.com/category/microsoft-bi/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.bimonkey.com</link>
	<description>James Beresford on Microsoft BI and Consulting in Sydney, Australia</description>
	<lastBuildDate>Mon, 06 Sep 2010 00:59:17 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.3</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Convert Text Stream to String</title>
		<link>http://www.bimonkey.com/2010/09/convert-text-stream-to-string/</link>
		<comments>http://www.bimonkey.com/2010/09/convert-text-stream-to-string/#comments</comments>
		<pubDate>Mon, 06 Sep 2010 00:59:17 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[Flat File]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Text Stream]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=809</guid>
		<description><![CDATA[One of the ongoing challenges with SSIS is its difficulty in handling complex or damaged text files. One approach to dealing with such files is to bring them all in as one wide text column and then split them using code. Sometimes, the file is too wide for that approach, so below is an extension of [...]]]></description>
			<content:encoded><![CDATA[<p>One of the ongoing challenges with SSIS is its difficulty in handling complex or damaged text files. One approach to dealing with such files is to <a title="Importing the entire contents of the file into a single column, then parsing it in a script task" href="http://www.bimonkey.com/2009/06/flat-file-source-error-the-column-delimiter-for-column-columnname-was-not-found/">bring them all in as one wide text column and then split them using code</a>. Sometimes, the file is too wide for that approach, so below is an extension of that method where you import the column as a text stream (DT_TEXT, or Unicode DT_NTEXT) and then split the text stream in a script transformation:</p>
<blockquote><p>        <span style="color: #008000;">&#8216; Declare variables</span><br />
        <span style="color: #0000ff;">Dim </span>TextStream <span style="color: #0000ff;">As</span> <span style="color: #0000ff;">Byte</span>()            <span style="color: #008000;">&#8216; To hold Text Stream</span><br />
        <span style="color: #0000ff;">Dim</span> TextStreamAsString <span style="color: #0000ff;">As String</span>  <span style="color: #008000;">  &#8216; To Hold Text Stream converted to String</span><br />
        <span style="color: #0000ff;">Dim</span> StringArray() <span style="color: #0000ff;">As String</span>       <span style="color: #008000;">  &#8216; To contain split Text Stream</span></p>
<p><strong>       <span style="color: #008000;"> &#8216; Load Text Stream into variable</span><br />
        TextStream = Row.TextStreamColumn.GetBlobData(0, <span style="color: #0000ff;">CInt</span>(Row.Column0.Length))</strong></p>
<p><strong>     <span style="color: #008000;">   &#8216; Convert Text Stream to string</span><br />
        TextStreamAsString = System.Text.Encoding.ASCII.GetString(TextStream)</strong></p>
<p>        <span style="color: #008000;">&#8216; Split string into array and output</span><br />
        StringArray = TextStreamAsString.Split(<span style="color: #800000;">&#8220;#&#8221;</span>)        </p>
<p>        Row.Column1 = StringArray(1).ToString<br />
        Row.Column2 = StringArray(2).ToString<br />
        Row.Column3 = StringArray(3).ToString  </p></blockquote>
<p>An important thing to note is that in the step where the Text Stream is converted to a string, the Encoding will depend on the type of text stream you are bringing in &#8211; Unicode files will need &#8221;Unicode&#8221; instead of &#8220;ASCII&#8221;. Also I have used a hash (&#8221;#&#8221;) as the column delimiter but that value will vary depending on what type of file you are bringing in.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/09/convert-text-stream-to-string/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Modifying an SSIS Package through code</title>
		<link>http://www.bimonkey.com/2010/09/modifying-an-ssis-package-through-code/</link>
		<comments>http://www.bimonkey.com/2010/09/modifying-an-ssis-package-through-code/#comments</comments>
		<pubDate>Thu, 02 Sep 2010 11:16:12 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[C#]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=806</guid>
		<description><![CDATA[Part of any SSIS development experience inevitably results in you discovering a minor mistake or something that was missed a long way into the development cycle &#8211; or even after, in testing (you do test your code, right?). Then you are faced with the tedious job of opening every single package, making a change in [...]]]></description>
			<content:encoded><![CDATA[<p>Part of any SSIS development experience inevitably results in you discovering a minor mistake or something that was missed a long way into the development cycle &#8211; or even after, in testing (you do test your code, right?). Then you are faced with the tedious job of opening every single package, making a change in every one&#8230; and getting some serious mouse finger. Much like I once did when I learned about <a title="SQL Server SSIS BufferTempStoragePath andBLOBTempStoragePath" href="http://www.bimonkey.com/2008/04/blobtempstoragepath-and-buffertempstoragepath/">BufferTempStoragePath</a>.</p>
<p>Fortunately, there is a way to automate these fixes. The SSIS Object model is (relatively) easily manipulated through .NET languages &#8211; so it&#8217;s not too difficult to write a small program that will change your package. Below is a sample I knocked up that will add a variable to an existing package and save the change:</p>
<blockquote><p><span style="color: #0000ff;">using </span>System;<br />
<span style="color: #0000ff;">using</span>Microsoft.SqlServer.Server;<br />
<span style="color: #0000ff;">using</span>Microsoft.SqlServer.Dts.Runtime;</p>
<p><span style="color: #0000ff;">namespace</span> Package_Modifier<br />
{<br />
    <span style="color: #0000ff;">class</span> Program<br />
    {<br />
        <span style="color: #0000ff;">static void</span> Main(<span style="color: #0000ff;">string</span>[] args)<br />
        {<br />
            <span style="color: #008000;">// Initialize an Application and Package object</span><br />
            <span style="color: #33cccc;">Application </span>app = <span style="color: #0000ff;">new</span> <span style="color: #33cccc;">Application</span>();<br />
            <span style="color: #33cccc;">Package</span> package = <span style="color: #0000ff;">null</span>;</p>
<p>            <span style="color: #008000;">// Set a package path</span><br />
            <span style="color: #33cccc;">String</span>pkgPath = <span style="color: #800000;">&#8220;C:\\BI Monkey\\SamplePackage.dtsx&#8221;</span>;</p>
<p>            <span style="color: #008000;">// Load the package in package object</span><br />
            package = app.LoadPackage(pkgPath, <span style="color: #0000ff;">null</span>);</p>
<p><strong>            <span style="color: #008000;">// Add the new variable</span><br />
            package.Variables.Add(<span style="color: #800000;">&#8220;NewVar&#8221;</span>, <span style="color: #0000ff;">false</span>, <span style="color: #800000;">&#8220;User&#8221;</span>, 0);</strong></p>
<p>            <span style="color: #008000;">// Save the package</span><br />
            app.SaveToXml(pkgPath, package, <span style="color: #0000ff;">null</span>);<br />
          }<br />
    }<br />
}</p></blockquote>
<p>You can essentially make any change you like to a package &#8211; I&#8217;ve chosen adding a variable because it&#8217;s an easy manipulation of the package object and I&#8217;ve got a long way to go before I work out how to do anything much harder <img src='http://www.bimonkey.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/09/modifying-an-ssis-package-through-code/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SSIS ETL Framework v1 goes Beta!</title>
		<link>http://www.bimonkey.com/2010/08/ssis-etl-framework-v1-goes-beta/</link>
		<comments>http://www.bimonkey.com/2010/08/ssis-etl-framework-v1-goes-beta/#comments</comments>
		<pubDate>Sun, 08 Aug 2010 23:39:42 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Framework]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=799</guid>
		<description><![CDATA[A quick update on the status of the BI Monkey SSIS ETL Framework (on Codeplex @ http://ssisetlframework.codeplex.com/)
The Framework v1 has gone into Beta &#8211; i&#8217;ve completed all the testing, and now just need to tidy up the reports and add some extra logging capability &#8211; but it is now fully functional. I&#8217;ve left it in [...]]]></description>
			<content:encoded><![CDATA[<p>A quick update on the status of the BI Monkey SSIS ETL Framework (on Codeplex @ <a href="http://ssisetlframework.codeplex.com/">http://ssisetlframework.codeplex.com/</a>)</p>
<p>The Framework v1 has gone into Beta &#8211; i&#8217;ve completed all the testing, and now just need to tidy up the reports and add some extra logging capability &#8211; but it is now fully functional. I&#8217;ve left it in Beta as I want to get some feedback on it before I move it live, plus fix those small details I mentioned.</p>
<p>So now I will press on with updating the documentation (yes, really!) and start laying the foundations for the more Enterprise level v2 framework.</p>
<p>I look forward to your feedback &#8211; please take advantage of Codeplex&#8217;s issue logging functionality to help me manage bugs and improvements.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/08/ssis-etl-framework-v1-goes-beta/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>BI Documenter from Pragmatic Works</title>
		<link>http://www.bimonkey.com/2010/08/bi-documenter-from-pragmatic-works/</link>
		<comments>http://www.bimonkey.com/2010/08/bi-documenter-from-pragmatic-works/#comments</comments>
		<pubDate>Wed, 04 Aug 2010 10:44:11 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[3rd Party Tools]]></category>
		<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=796</guid>
		<description><![CDATA[I recently had to demo SSIS to an enterprise as part of an ETL tool evaluation. One of the Microsoft BI stacks weaknesses is the lack of Data Lineage tracking. What this means is there is nothing embedded in the toolset that allows you to identify clearly the source of a data item in a [...]]]></description>
			<content:encoded><![CDATA[<p>I recently had to demo SSIS to an enterprise as part of an ETL tool evaluation. One of the Microsoft BI stacks weaknesses is the lack of Data Lineage tracking. What this means is there is nothing embedded in the toolset that allows you to identify clearly the source of a data item in a package / report / cube without digging into the development environment. Rumours are this will be fixed in the next release, however nothing has yet been confirmed.</p>
<p>However, where the Microsoft BI stack has a competitor beating edge is the 3rd Party ecosystem &#8211; so where there is a gap in the toolset, often another company will step in and fill it. In this case, Data Lineage issues are addressed by a tool created by <a title="Pragmatic Works" href="http://pragmaticworks.com/">Pragmatic Works</a> (which is run by Brian Knight, an SSIS heavyweight) called BI Documenter.</p>
<h2>BI Documenter Review</h2>
<p>So what does the tool do? It has 3 key functions:</p>
<p><strong>Documentation Generation:</strong> The tool auto-generates documentation for Databases, SSIS packages, SSAS Cubes and SSRS reports. It&#8217;s quick, and the output is pretty &#8211; and it&#8217;s really a bit useless. Its benefit is if you have to say you&#8217;ve produced some documentation and need to do so with minimal effort. The reason I say this is because it provides no context for why things have been done, what the purpose of the component is, where it fits in to the framework etc. My usual gripe with documentation I come across is that it only tells me the what, not the why, and the why helps me solve a problem. This tool can&#8217;t do anything to address this.</p>
<p><strong>Data Lineage:</strong> Now this is where the main value lies. Through simple navigation you can select any object (table / view / package etc) in the solution and see what objects depend on it and what objects it depends on in turn. This is great in a complex system where if you need to make a change and need to find out what that impacts.</p>
<p>Now its not perfect &#8211; it seems to skip documenting some sources, such as flat files, so they get missed in the impact analysis. And the level of granularity is at the object level &#8211; for example you can&#8217;t see the impact of an individual column change, just at the table level, but its still a great start and a useful tool.</p>
<p><strong>Snapshot comparison:</strong> A final piece of value which can be useful in troubleshooting. BI Documenter takes snapshots of your solution to document &#8211; and you can compare these to identify changes in the solution. The detail level is pretty good and will be a great place to start tracing changes in your system when your source control systems fail.</p>
<h2>Conclusions</h2>
<p>Is the tool worth it? At a maximum cost of US$500 a seat, it&#8217;s definitely got a place somewhere in your organisation. The documentation tool is of limited use but the Data Lineage and Snapshot comparisons are worth the cost of the product. Full details here: <a title="Pragmatic Works - BI Documenter" href="http://pragmaticworks.com/Products/Business-Intelligence/BIDocumenter/Default.aspx">Pragmatic Works &#8211; BI Documenter</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/08/bi-documenter-from-pragmatic-works/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SSRS &amp; Stored Procedures</title>
		<link>http://www.bimonkey.com/2010/07/ssrs-stored-procedures/</link>
		<comments>http://www.bimonkey.com/2010/07/ssrs-stored-procedures/#comments</comments>
		<pubDate>Mon, 19 Jul 2010 01:47:17 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Reporting Services]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=790</guid>
		<description><![CDATA[As an ETL Monkey, my experience with reports has been a bit incidental. One thing that puzzled me though is why report developers always wrote stored procedures to generate data for their reports, instead of using the SQL capabilities within the report. They said &#8220;Best Practice&#8221; and I was happy to leave them to it!
In [...]]]></description>
			<content:encoded><![CDATA[<p>As an ETL Monkey, my experience with reports has been a bit incidental. One thing that puzzled me though is why report developers always wrote stored procedures to generate data for their reports, instead of using the SQL capabilities within the report. They said &#8220;Best Practice&#8221; and I was happy to leave them to it!</p>
<p><a title="SSRS - Should I Use Embedded TSQL Or A Stored Procedure?" href="http://jahaines.blogspot.com/2009/11/ssrs-should-i-use-embedded-tsql-or.html">In this detailed post Adam Haines explains why</a>. It&#8217;s very detailed so here are the key points if you have a short attention span:</p>
<ul>
<li>The query can be maintained independently of SSRS, allowing tuning the query without accessing or modifying the reports</li>
<li>The execution plans can be cached if you use an Stored Procedure, but not if you use SSRS</li>
<li>Stored procedures allow the use of certain objects that cannot be used in embedded T-SQL in the report such as temp tables and indexes specific to those temp tables as well as table variables</li>
<li>Stored Procedures provide a layer of abstraction between the report and the business logic</li>
<li>Stored Procedures allow re-use of similar logic</li>
</ul>
<p>Credit for the above list to my colleague John Simon who authored most of the above points in an internal discussion.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/07/ssrs-stored-procedures/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Row Sampling Transformation</title>
		<link>http://www.bimonkey.com/2010/06/the-row-sampling-transformation/</link>
		<comments>http://www.bimonkey.com/2010/06/the-row-sampling-transformation/#comments</comments>
		<pubDate>Sun, 06 Jun 2010 12:30:40 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Row Sampling]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=683</guid>
		<description><![CDATA[It&#8217;s been a long time since I did one of these! In this post I will be covering the Row Sampling Transformation. The sample package can be found here for 2005 and guidelines on use are here.
What does the Row Sampling Transformation do?
The Row Sampling Transformation takes a fixed number of rows from a source [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignnone" style="width: 179px"><img class="  " title="Row Sampling Transformation" src="http://www.bimonkey.com/uploads/componentreview/rowsampling1.jpg" alt="Fig 1: The Row Sampling Transformation" width="169" height="64" /><p class="wp-caption-text">Fig 1: The Row Sampling Transformation</p></div>
<p>It&#8217;s been a long time since I did one of these! In this post I will be covering the Row Sampling Transformation. The sample package can be found <a title="SQL 2005 SSIS Row Sampling Transformation Sample (Right Click, Save As)" href="http://www.bimonkey.com/uploads/componentreview/Row%20Sampling%20Transformation%20Basics%202005.dtsx">here for 2005</a> and guidelines on use are <a title="Using samples from BI Monkey" href="http://www.bimonkey.com/support/using-ssis-samples-from-this-site/">here</a>.</p>
<h2>What does the Row Sampling Transformation do?</h2>
<p>The Row Sampling Transformation takes a fixed number of rows from a source data set &#8211; in a similar manner to the <a title="The Percentage Sampling Transformation" href="http://www.bimonkey.com/2009/06/the-percentage-sampling-transformation/">Percentage Sampling Transformation</a>, except that instead of a proportion of your data, it takes a fixed number of rows. It splits your data set into two sets, the Sampled and Unsampled outputs, as below where 10 rows of a 100 row data set have been sampled:</p>
<div class="wp-caption alignnone" style="width: 629px"><img class=" " title="The Row Sampling Transformation outputs" src="http://www.bimonkey.com/uploads/componentreview/rowsampling3.jpg" alt="The Row Sampling Transformation outputs" width="619" height="235" /><p class="wp-caption-text">Fig 2: The Row Sampling Transformation outputs</p></div>
<p>The assigning of rows to an output is nominally random, but given the same data set and random seed (explained below), the same rows will always be selected each time you run the package.</p>
<h2>Configuring the Row Sampling Transformation</h2>
<p>There are two important properties to configure on the transformation. First is the Number of rows, which determines how many rows will fall into the Sample output. Second is the random seed. This seed tells the random selection algorithm which rows to choose. If you fix the seed, you will get consistent results &#8211; if you understand a little about randomisation in computing, you will understand randomness is a bit of a relative concept to a computer. If you leave the checkbox unselected, the package will pick a random seed based on the OSes&#8217; tick count, so results will appear to change.</p>
<p>You can also name your Sample and Unselected outputs, should you wish. It&#8217;s worth noting that you aren&#8217;t obliged to actually use either output downstream of the component, so you can use this component to select a fixed number of rows from your source &#8211; or ignore a fixed number of rows from your source, by only using the Unselected output.</p>
<div class="wp-caption alignnone" style="width: 481px"><img class="   " title="Configuring the Row Sampling Transformation" src="http://www.bimonkey.com/uploads/componentreview/rowsampling2.jpg" alt="Configuring the Row Sampling Transformation" width="471" height="356" /><p class="wp-caption-text">Fig 3: Configuring the Row Sampling Transformation</p></div>
<h2>Where should you use the Row Sampling Transformation?</h2>
<p>The main use for this would be to select a fixed size subset of data. This subset could be used for Data Mining test sets, or for limiting your data set size when testing packages &#8211; e.g. if you are running against a multimillion row data source, you could just run the package with 100 rows to see if your processes worked.</p>
<p>MSDN Documentation for the Row Sampling Transformation can be found here for <a title="SQL 2008 Row Sampling Transformation Documentation on MSDN" href="http://msdn.microsoft.com/en-us/library/ms141200.aspx">2008</a> and here for <a title="SQL 2005 Row Sampling Transformation Documentation on MSDN" href="http://msdn.microsoft.com/en-us/library/ms141200(SQL.90).aspx">2005</a>.</p>
<p>If you need specific help or advice, or have suggestions on the post, please leave a comment and I will do my best to help you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/06/the-row-sampling-transformation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An SQL alternative to the SCD</title>
		<link>http://www.bimonkey.com/2010/05/an-sql-alternative-to-the-scd/</link>
		<comments>http://www.bimonkey.com/2010/05/an-sql-alternative-to-the-scd/#comments</comments>
		<pubDate>Tue, 11 May 2010 10:47:34 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[merge]]></category>
		<category><![CDATA[Slowly Changing Dimension]]></category>
		<category><![CDATA[T-SQL]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=745</guid>
		<description><![CDATA[In SQL 2008 a new T-SQL construct was added - the MERGE operation. (Ok, pedants will know this wasn&#8217;t new to Oracle,  but it was new to SQL Server).
This operation allows for the merging of a dataset into a reference dataset &#8211; which can be remarkably similar to Insert / Update operations effected by the Slowly Changing Dimension transformation. [...]]]></description>
			<content:encoded><![CDATA[<p>In SQL 2008 a new T-SQL construct was added - the <a title="MERGE (Transact-SQL)" href="http://technet.microsoft.com/en-us/library/bb510625.aspx">MERGE</a> operation. (Ok, pedants will know this wasn&#8217;t new to Oracle,  but it was new to SQL Server).</p>
<p>This operation allows for the merging of a dataset into a reference dataset &#8211; which can be remarkably similar to Insert / Update operations effected by the <a title="Slowly Changing Dimension Transformation" href="http://msdn.microsoft.com/en-us/library/ms141715.aspx">Slowly Changing Dimension</a> transformation. However the way it operates is very different. Instead of the SCD&#8217;s row by row evaluation approach, the MERGE operation is a set based operation. What this means is it compares the whole of the source dataset to the reference dataset in a single pass. This has significant implications for performance &#8211; on a site where I implemented this the operation which took 1,200 seconds in the SCD cut down to 51 seconds using a Merge.</p>
<p>There are limitations and differences to be aware of:</p>
<ul>
<li>You cannot directly return row counts for Insert / Update / Ignore operations in the Merge</li>
<li>As it is a bulk operation a single row will cause failure of the whole batch</li>
<li>There&#8217;s no GUI &#8211; just hand crafted SQL</li>
<li>Less error trapping / logging options</li>
<li>More flexibility in terms of actions when matches / non matches are found</li>
</ul>
<p>The main reason why you would consider the SQL Merge &#8211; it handles Type 1, and with a little cunning, Type 2 dimensions &#8211; in a fraction of the time it takes the SCD to plod through. It&#8217;s still not as fast as a proper in memory comparison using something such as <a title="TableDifference" href="http://www.cozyroc.com/ssis/table-difference">TableDifference</a> &#8211; but it&#8217;s always good to know you have something else available in your toolkit.</p>
<p>Further information:</p>
<ul>
<li><a title="Using the SQL MERGE Statement for Slowly Changing Dimension Processing" href="http://www.kimballgroup.com/html/08dt/KU107_UsingSQL_MERGESlowlyChangingDimension.pdf">Using the SQL MERGE Statement for Slowly Changing Dimension Processing</a> &#8211; from the Kimball Group</li>
<li><a title="Alternatives to SSIS SCD Wizard Component" href="http://bennyaustin.wordpress.com/2010/05/29/alternatives-to-ssis-scd-wizard-component/">How to create type 1 &amp; 2 SCD&#8217;s using standard SSIS components (other than the SCD)</a> (at the bottom of the post) &#8211; <a title="Benny Austin" href="http://bennyaustin.wordpress.com/">Benny Austin</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/05/an-sql-alternative-to-the-scd/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Managing your history data</title>
		<link>http://www.bimonkey.com/2010/05/managing-your-history-data/</link>
		<comments>http://www.bimonkey.com/2010/05/managing-your-history-data/#comments</comments>
		<pubDate>Tue, 11 May 2010 00:11:38 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Analysis Services]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=762</guid>
		<description><![CDATA[This post is to an extent a small rant about some design decisions I have been constrained by on my current project. These decisions were made predominantly for one fairly bad reason: it made the architect&#8217;s life easier (apologies to the architects if they are reading &#8211; but these were bad choices!)
The design choices in question [...]]]></description>
			<content:encoded><![CDATA[<p>This post is to an extent a small rant about some design decisions I have been constrained by on my current project. These decisions were made predominantly for one fairly bad reason: it made the architect&#8217;s life easier (apologies to the architects if they are reading &#8211; but these were bad choices!)</p>
<p>The design choices in question are around the managing of history data. In one component of the system it relates to Database storage design, the other relates to Cube storage design. In both cases the history data is stored in a separate location to the &#8220;current&#8221; data.</p>
<h2>Databases: Why separate history tables are a bad idea</h2>
<p>The first &#8211; and most compelling &#8211; reason for not storing your history data in separate tables to your current tables is that it increases complexity for users. Instead of having one location to look for data, your users now have to use two.</p>
<p>The second compelling reason is that there is no point to doing this from a storage point of view. SQL 2005 &amp; 2008 (<span style="text-decoration: underline;">Enterprise</span> editions only, admittedly) provide <a title="Partitioned Tables and Indexes in SQL Server 2005" href="http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx">partitioning</a>. This enables the contents of an individual table to be stored in separate locations on different <a title="Using Files and Filegroups" href="http://msdn.microsoft.com/en-us/library/ms187087.aspx">filegroups</a>. This means that you can store your current days data in one location and your history in a different one. The reason for doing this is the same as splitting it into separate tables &#8211; that querying the current section will be faster than the historic section.  In theory queries against partitioned tables should in fact be faster as the current data is now no longer in the same filegroup as the history data.</p>
<p>Now, there is an overhead associated with designing and maintaining partitions but I don&#8217;t see that it is significantly larger than that required to deal with the process required to archive data into separate tables on a daily basis. Additionally when maintaining separate history tables, you need to separate out every single table, whether it gets 10 rows a day or 10 million. With partitioning you can just target the large tables that need that focus.</p>
<p>There are other downsides to maintaining separate tables. If you make a change to a table design, you need to do it in 2 places.  You also need to remember to update your history processes. If your history process fails, you can end up with users getting unexpected query results or ETL process failures when the system loads the next day&#8217;s data into the current tables &#8211;  and untangling it becomes a real mess. If your partition processes fail to run, you just have too much data in one filegroup for a while &#8211; unlikely to be fatal.</p>
<p>So if you have large tables you need to split out for performance purposes &#8211; do it at the back end, using the power of the database &#8211; which <strong>is</strong> designed to store data efficiently. Keep it away from the users &#8211; they neither need to know or care about your need to keep the data separate. If you want to give them a single object to query with the current day&#8217;s data, just use views.</p>
<h2>Cubes: Why a separate history Cube is a bad idea</h2>
<p>Much of the above applies here &#8211; <a title="Managing Analysis Services Partitions" href="http://msdn.microsoft.com/en-us/library/ms175604.aspx">SSAS also has partitions</a> &#8211; so you can again store your historic and current data in separate physical locations with the users being totally unaware of this. Again there is overhead in maintenance, but this will also balance out with the maintenance and risks associated with maintaining two identical cubes that only differ in terms of data source.</p>
<h2>Use your storage options!</h2>
<p>So without banging on about the same things any further, please consider the following two points whenever considering managing your history data:</p>
<ol>
<li>How does what i&#8217;m planning affect my users?</li>
<li>How does what i&#8217;m planning leverage the platforms capabilities?</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/05/managing-your-history-data/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>AUSSUG Followup</title>
		<link>http://www.bimonkey.com/2010/04/aussug-followup/</link>
		<comments>http://www.bimonkey.com/2010/04/aussug-followup/#comments</comments>
		<pubDate>Thu, 15 Apr 2010 11:22:21 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Framework]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=758</guid>
		<description><![CDATA[After the great session at AUSSUG on Tuesday, I thought i&#8217;d share the presentation materials, so here&#8217;s my powerpoint slides. I enjoyed the shorter, more interactive format.
I&#8217;ve also been busy updating the documentation for the BI Monkey SSIS ETL Framework on the codeplex site, and have uploaded a fresh alpha release &#8211; which is significantly [...]]]></description>
			<content:encoded><![CDATA[<p>After the great session at AUSSUG on Tuesday, I thought i&#8217;d share the presentation materials, so <a title="ETL Frameworks Presentation" href="http://www.bimonkey.com/uploads/framework/ETL Frameworks Presentation.zip">here&#8217;s my powerpoint slides</a>. I enjoyed the shorter, more interactive format.</p>
<p>I&#8217;ve also been busy updating the documentation for the BI Monkey SSIS ETL Framework on the <a title="BI Monkey SSIS ETL Framework" href="http://ssisetlframework.codeplex.com">codeplex site</a>, and have uploaded a fresh alpha release &#8211; which is significantly more tested and reliable &#8211; it&#8217;s already had quite a few downloads so i&#8217;m looking forward to some feedback soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/04/aussug-followup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The BI Monkey SSIS ETL Framework</title>
		<link>http://www.bimonkey.com/2010/04/the-bi-monkey-ssis-etl-framework/</link>
		<comments>http://www.bimonkey.com/2010/04/the-bi-monkey-ssis-etl-framework/#comments</comments>
		<pubDate>Sun, 11 Apr 2010 11:01:34 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[BI Monkey News]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Framework]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=749</guid>
		<description><![CDATA[I have, after at least 3 incarnations of building SSIS based ETL control frameworks, decided to do a fourth and (possibly) final one &#8211; for all to share. The alpha release of this is available now on CodePlex @ http://ssisetlframework.codeplex.com. I&#8217;ve put an aplha up, rather than a better tested Beta because the youngest BI [...]]]></description>
			<content:encoded><![CDATA[<p>I have, after at least 3 incarnations of building SSIS based ETL control frameworks, decided to do a fourth and (possibly) final one &#8211; for all to share. The alpha release of this is available now on CodePlex @ <a title="BI Monkey SSIS ETL Framework" href="http://ssisetlframework.codeplex.com">http://ssisetlframework.codeplex.com</a>. I&#8217;ve put an aplha up, rather than a better tested Beta because the youngest BI Monkey in my household appears to be unhappy being a baby right now and is taking me away from this side project.</p>
<p>It&#8217;s fairly standard stuff from my point of view &#8211; it&#8217;s metadata driven framework that consists of a single control package and a template execution package. I&#8217;ve aimed to include the following features, all of which can be turned on or off in the metadata without altering any code.</p>
<ul>
<li>Recoverability</li>
<li>Extraction constraints</li>
<li>Execution order</li>
<li>Dependencies</li>
<li>Failure handling</li>
</ul>
<p>The SSIS packages are pretty much a simple interface for where the meat of the decision making is occurring &#8211; in a bundle of stored procedures where it&#8217;s a lot easier to write and maintain complex decisions about processing than in SSIS.</p>
<p>I&#8217;m adhering to my own self-set principles on usability as well, namely to hit these targets:</p>
<ul>
<li>High visibility of operation via reports</li>
<li>Fully commented code</li>
<li>No acronynms in table and field names</li>
<li>Any codes used exposed in a metadata table</li>
</ul>
<p>I will be doing a demo of it at the AUSSUG session in Sydney this coming Tuesday as part of the SSIS round table which is taking place:</p>
<blockquote><p><strong>James Beresford:  ETL Frameworks</strong></p>
<p>Design and Implementation considerations when building ETL Control Frameworks, Including the debut demo of the BI Monkey ETL Framework</p>
<p><strong>Kevin Wong:  Practical ADO. NET</strong></p>
<p>Three practical showcases using ADO.NET in SSIS including re-usable in memory resultsets</p>
<p><strong>Glyn Llewellyn:  Putting the T back into ETL</strong></p>
<p>Looking at alternative ways in SSIS to perform data transformations and correction and discussing the advantages and disadvantages of each method</p>
<p><em>(Full details at the <a title="Australian SQL Server User Group" href="http://www.sqlserver.org.au/">AUSSUG website</a></em>)</p></blockquote>
<p>I hope to see you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2010/04/the-bi-monkey-ssis-etl-framework/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
