<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BI Monkey &#187; Data Mining</title>
	<atom:link href="http://www.bimonkey.com/tag/data-mining/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.bimonkey.com</link>
	<description>James Beresford on Microsoft BI and Consulting in Sydney, Australia</description>
	<lastBuildDate>Mon, 23 Jan 2012 22:01:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>The Term Extraction Transformation and &#8220;Animal Farm&#8221;</title>
		<link>http://www.bimonkey.com/2009/07/the-term-extraction-transformation/</link>
		<comments>http://www.bimonkey.com/2009/07/the-term-extraction-transformation/#comments</comments>
		<pubDate>Wed, 15 Jul 2009 08:25:00 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Term Extraction]]></category>
		<category><![CDATA[Text Mining]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=436</guid>
		<description><![CDATA[In this post I will be covering the Term Extraction Transformation. The sample package can be found here for 2005 and guidelines on use are here. Todays exercise will be a fun one as i&#8217;m going to apply the transformation to George Orwell&#8217;s book Animal Farm &#8211; a copy of which I obtained in text form [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignnone" style="width: 264px"><img title="The Term Extraction Transformation" src="http://www.bimonkey.com/uploads/componentreview/term1.jpg" alt="b" width="254" height="66" /><p class="wp-caption-text">Fig 1: The Term Extraction Transformation</p></div>
<p>In this post I will be covering the Term Extraction Transformation. The sample package can be found <a title="SQL 2005 SSIS Term Extraction Sample (Right Click, Save As)" href="http://www.bimonkey.com/uploads/componentreview/Term%20Extraction%20Transformation%20Basics%202005.zip">here for 2005</a> and guidelines on use are <a title="Using samples from BI Monkey" href="http://www.bimonkey.com/support/using-ssis-samples-from-this-site/">here</a>. Todays exercise will be a fun one as i&#8217;m going to apply the transformation to George Orwell&#8217;s book Animal Farm &#8211; a copy of which I obtained in text form from <a title="Project Gutenburg Australia" href="http://gutenberg.net.au/">Project Gutenburg Australia</a>.</p>
<h2>What does the Term Extraction Transformation do?</h2>
<p>In simplest terms it can extract individual nouns and collections of nouns and adjectives from text (these are the &#8220;Terms&#8221;) and returns them with a frequency count or score.In my example, a common <strong>Noun </strong>term is &#8220;Animal&#8221;, and a common <strong>Noun Phrase </strong>term is &#8220;Animal Farm&#8221;.</p>
<p>Because it uses an internal dictionary to simplify terms to identify repeated elements &#8211; such as removing plurals &#8211; it only works for English text. The dictionary is not exposed and cannot be edited, nor can the component be pointed at a custom dictionary of your choosing, so like the Fuzzy Lookup it is a bit of a black box in terms of your ability to tweak its operation &#8211; the algorithms and dictionary are fixed &#8211; i&#8217;ll pick up some flaws with this later. The only real control you can have over the content of the output is the use of an <strong>Exclusion List</strong>, which allows you to feed a list of terms to ignore into the component.</p>
<h2>Configuring the Term Extraction Transformation</h2>
<div class="wp-caption alignnone" style="width: 369px"><img title="The Term Extraction Transformation - The Term Extraction Tab" src="http://www.bimonkey.com/uploads/componentreview/term2.jpg" alt="b" width="359" height="303" /><p class="wp-caption-text">Fig 2: The Term Extraction Tab</p></div>
<p>The first thing to configure is the input column on the &#8220;Term Extraction&#8221; tab &#8211; this transformation accepts a single input column which must be either a Unicode Text Stream or Unicode String. In the example package i&#8217;ve simply used a Data Conversion task to convert my Non-Unicode input stream prior to the Term Extraction. You can also assign custom names to the Term and Score columns as well.</p>
<div class="wp-caption alignnone" style="width: 578px"><img title="The Term Extraction Transformation - The Exclusion Tab" src="http://www.bimonkey.com/uploads/componentreview/term3.jpg" alt="b" width="568" height="234" /><p class="wp-caption-text">Fig 3: The Exclusion Tab</p></div>
<p>Next up is to specify your Exclusion list, if you are using one &#8211; this must be in the form of a single column in a table in either a SQL Server or Access Database (<a href="https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=249043">apparently Excel is also an undocumented  option</a>) .In my example I have used the Name column of the Adventureworks Departments table, so the names of any Departments that appear in the text won&#8217;t appear in the output. Admittedly this is unlikely in Animal Farm, but if you were web mining your own website may choose to ignore your company name as it will appear often and may tell you nothing.</p>
<div class="wp-caption alignnone" style="width: 501px"><img title="The Term Extraction Transformation - Advanced Tab" src="http://www.bimonkey.com/uploads/componentreview/term4.jpg" alt="b" width="491" height="310" /><p class="wp-caption-text">Fig 4: The Advanced Tab</p></div>
<p>The final page is the most important in terms of affecting the output. <strong>Term Type</strong> controls whether the component returns Nouns, Noun Phrases &#8211; or both. <strong>Score type</strong> controls whether the score returned is a simple count or the TFIDF &#8211; the Inverse Document Frequency &#8211; TFIDF of a Term T = (frequency of T) * log( (#rows in Input) / (#rows having T) ). I&#8217;m sure that&#8217;s a useful number to someone. <strong>Parameters</strong> sets the minimum frequency a term has to have before it will be output &#8211; obviously a setting of 1 would  return every siingle noun and /or noun phrase found. Maximum length of term sets the maximum number of words in a term. Finally <strong>Options</strong> sets the case sensitivity of the search.</p>
<h2>The Term Extraction Transformations&#8217; dictionary limits</h2>
<p>The problem with this component stems from its black box dictionary which limits how well it can handle data. As an example, despite it claiming to remove plurals, if you look in the results of the example package, both Commandment and Commandments appear as distinct terms. If you extend this to the real world &#8211; say, mining emails or web pages &#8211; misspellings are common, product names are often nonsensical from a dictionary point of view &#8211; and a custom dictionary would allow you to work around that. As it is you would end up having to fix it after extracting it.</p>
<p>By adding a custom dictionary, or allowing it to be extended in the reverse of an exclusion table, this component would become more useful. I&#8217;ve added a <a title="SSIS Term Extraction Transformation Custom Dictionary" href="https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=474073">connect article suggesting this</a> &#8211; please vote it up if you think it will improve your lot. <em><span style="color: #ff0000;"><strong>Update 21/10/2010</strong></span>: The SSIS Team are not implementing this feature, which is a shame.</em></p>
<h2>When would you use the Term Extraction Transformation?</h2>
<p><a title="Douglas Laudenschlager @ http://dougbert.com/" href="http://dougbert.com/">Douglas Laudenschlager</a> comments <a title="Text Mining with Term Lookup and Term Extraction" href="http://dougbert.com/blogs/dougbert/archive/2008/05/24/text-mining-with-term-lookup-and-term-extraction.aspx">here on some scenarios envisaged by Microsoft Research in China for use of terms within text data for mining</a>. It should be applied to situations where you need to trawl through large amounts of (English) text data to pull out common terms. One use I attempted when learning SSIS was to try and emulate the <a title="Quackometer" href="http://www.quackometer.net/">Quackometer</a>, a web based tool that tries to analyse web pages and determine if their content is valid science or junk science. I did this by pulling down the web pages as text, running them through the Term Extraction and then trying to detect common valid and junk science terms (and using an Exclusion list to remove common HTML terms). I never finished it but it remains a lurking project which may yet reappear on these pages.</p>
<p>MSDN Documentation for the Term Extraction Transformation can be found here for <a title="SQL 2008 Term Extraction Transformation Documentation on MSDN" href="http://msdn.microsoft.com/en-us/library/ms141809.aspx">2008</a> and here for <a title="SQL 2005 Term Extraction Transformation Documentation on MSDN" href="http://msdn.microsoft.com/en-us/library/ms141809(SQL.90).aspx">2005</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2009/07/the-term-extraction-transformation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A caution on using Dimensional DSVs in Data Mining &#8211; part 2</title>
		<link>http://www.bimonkey.com/2009/06/a-caution-on-using-dimensional-dsvs-in-data-mining-part-2/</link>
		<comments>http://www.bimonkey.com/2009/06/a-caution-on-using-dimensional-dsvs-in-data-mining-part-2/#comments</comments>
		<pubDate>Thu, 18 Jun 2009 02:08:31 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[DSV]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=343</guid>
		<description><![CDATA[As a followup to this post I have found that not only does using a table external to the one being mined to provide a grouping fail to actually group within the model, it also confuses the Mining Legend in the Mining Model Viewer. What I was seeing in the Mining Legend for a node [...]]]></description>
			<content:encoded><![CDATA[<p>As a followup to <a title="A caution on using Dimensional DSVs in Data Mining" href="http://www.bimonkey.com/2009/06/a-caution-on-using-dimensional-dsvs-in-data-mining/">this post</a> I have found that not only does using a table external to the one being mined to provide a grouping fail to actually group within the model, it also confuses the Mining Legend in the Mining Model Viewer.</p>
<p>What I was seeing in the Mining Legend for a node in a Decision Tree was like this:</p>
<p>Total Cases: 100</p>
<p>Category A: 10 Cases</p>
<p>Category B: 25 Cases</p>
<p>Category C: 0 Cases</p>
<p>Category D: 9 Cases</p>
<p>&#8230; so the Total cases and the cases displayed didn&#8217;t tie up. By digging further using the Microsoft Mining Content Viewer and looking at the NODE_DISTRIBUTION I saw that there were multiple rows for the categories, and the Mining Legend was just picking one of those values.</p>
<p>So if you find youself looking at a node and wondering why the numbers don&#8217;t add up &#8211; it&#8217;s because your grouping hasn&#8217;t been used by the model.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2009/06/a-caution-on-using-dimensional-dsvs-in-data-mining-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A caution on using Dimensional DSVs in Data Mining</title>
		<link>http://www.bimonkey.com/2009/06/a-caution-on-using-dimensional-dsvs-in-data-mining/</link>
		<comments>http://www.bimonkey.com/2009/06/a-caution-on-using-dimensional-dsvs-in-data-mining/#comments</comments>
		<pubDate>Mon, 15 Jun 2009 23:23:39 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[DSV]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=338</guid>
		<description><![CDATA[If you are using a dimensional-style DSV in a Data Mining project, such as below: Be aware that if you include a column from a Dimension table in your Mining Structure, the model will actually identify each key entry on the source table as a distinct value, rather than each distinct value in the Dimension [...]]]></description>
			<content:encoded><![CDATA[<div class="mceTemp">If you are using a dimensional-style DSV in a Data Mining project, such as below:</div>
<div class="wp-caption alignnone" style="width: 393px"><img title="A Dimensional DSV" src="http://www.bimonkey.com/uploads/ssas/dsv1.jpg" alt="b" width="383" height="223" /><p class="wp-caption-text">Fig 1: A Dimensional DSV</p></div>
<p>Be aware that if you include a column from a Dimension table in your Mining Structure, the model will actually identify each key entry on the <em>source</em> table as a distinct value, rather than each distinct value in the Dimension table. I found this out because I added a grouping category to one of my dimensional tables &#8211; a simple high &#8211; medium &#8211; low group &#8211; and there were multiple values in the attribute states for each grouping, as below:</p>
<div class="wp-caption alignnone" style="width: 218px"><img title="Mining Legend" src="http://www.bimonkey.com/uploads/ssas/mininglegend1.jpg" alt="b" width="208" height="157" /><p class="wp-caption-text">Fig 2: Mining Legend</p></div>
<p>To work around this you will need to add a Named Calculation to get the group on the main table, or convert the main table to a Named Query.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2009/06/a-caution-on-using-dimensional-dsvs-in-data-mining/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Quick book review: Data Mining with SQL Server 2005</title>
		<link>http://www.bimonkey.com/2009/06/quick-book-review-data-mining-with-sql-server-2005/</link>
		<comments>http://www.bimonkey.com/2009/06/quick-book-review-data-mining-with-sql-server-2005/#comments</comments>
		<pubDate>Wed, 10 Jun 2009 13:50:08 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=322</guid>
		<description><![CDATA[I&#8217;ve just about squeezed all I can from Data Mining with SQL Server 2005 by ZhaoHui Tang and Jamie MacLennan &#8211; both of whom were part of the Data Mining development team for SQL Server 2005. This book provides a lot of what seems to be absent from BOL and MSDN &#8211; it goes through [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve just about squeezed all I can from <strong>Data Mining with SQL Server 2005</strong> by ZhaoHui Tang and Jamie MacLennan &#8211; both of whom were part of the Data Mining development team for SQL Server 2005.</p>
<p>This book provides a lot of what seems to be absent from BOL and MSDN &#8211; it goes through most facets of Data Mining using SQL server reasonably thoroughly, but from a very technical angle. It is littered with big chunks of code and feels and reads like technical documentation most of the way through. It doesn&#8217;t provide much insight into how to carry out effective Data Mining or interpret results &#8211; what little is there is useful, but it&#8217;s a slog to find it.</p>
<p>As a technical reference I&#8217;d recommend it, not least because of the dearth of decent documentation. If you&#8217;re a beginner trying to work out how to use the product to get results, you need to look elsewhere.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2009/06/quick-book-review-data-mining-with-sql-server-2005/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cannot View Data Mining Model in BIDS &#8211; function does not exist</title>
		<link>http://www.bimonkey.com/2009/06/cannot-view-data-mining-model-in-bids-function-does-not-exist/</link>
		<comments>http://www.bimonkey.com/2009/06/cannot-view-data-mining-model-in-bids-function-does-not-exist/#comments</comments>
		<pubDate>Thu, 04 Jun 2009 02:08:56 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Analysis Services]]></category>
		<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=315</guid>
		<description><![CDATA[I&#8217;d been running some Naive Bayes Data Mining models without problems as part of initiating a Data Mining exercise, so it was time to move on and cut the data some different ways. So I set up a Decision Tree model and it processed fine, but when I tried to view it a message box [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;d been running some Naive Bayes Data Mining models without problems as part of initiating a Data Mining exercise, so it was time to move on and cut the data some different ways. So I set up a Decision Tree model and it processed fine, but when I tried to view it a message box appeared telling me it wasn&#8217;t going to co-operate:</p>
<p><span style="color: #ff0000;">The tree graph cannot be created because of the following error:</span></p>
<p><span style="color: #ff0000;">&#8216;Query (1,6) The </span></p>
<p><span style="color: #ff0000;">&#8216;[System].[Microsoft].[AnalysisServices].[System].[DataMining].[DecisionTrees].[GetTreeScores] function does not exist.&#8217;.</span></p>
<p><a title="Error while trying to load the mining model in the mining model viewer" href="http://social.msdn.microsoft.com/forums/en-US/sqldatamining/thread/2b51695a-460f-4bfd-8502-fc2fe8b15a63/ ">Fortunately someone had hit this before</a>, as the solution is rather obscure. The install I am working against is non-standard, being split across two drives. What had happened is the path for the Data Mining dll&#8217;s set up in the install process didn&#8217;t actually match where they were placed.</p>
<p>So when I looked under the assembly location &#8211; SSMS &gt; AS Server &gt; Assemblies &gt; System &gt; Properties, the Source Path referenced a dll that didn&#8217;t actually exist &#8211; so it appears this incorrect path does not raise an error when trying to start the server. To fix it, I located <span>located where the dll really was, </span>then updated the config files where this path is stored &#8211; <span><span style="color: #800080;">System.0.asm.xml</span> and V<span style="color: #800080;">BAMDX.0.asm.xml</span> &#8211; to point to that path.</span></p>
<p><span>A restart of the server and the models reprocessed and I could happily view the output!<br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2009/06/cannot-view-data-mining-model-in-bids-function-does-not-exist/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Percentage Sampling Transformation</title>
		<link>http://www.bimonkey.com/2009/06/the-percentage-sampling-transformation/</link>
		<comments>http://www.bimonkey.com/2009/06/the-percentage-sampling-transformation/#comments</comments>
		<pubDate>Mon, 01 Jun 2009 11:40:51 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Integration Services]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Percentage Sampling]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=293</guid>
		<description><![CDATA[In this post I will be covering the Percentage Sampling Transformation. The sample package can be found here for 2005 and guidelines on use are here. What does the Percentage Sampling Transformation do? This component is very simple &#8211; it splits a dataset by randomly directing rows to one of two possible outputs (as you [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignnone" style="width: 262px"><img title="The Percentage Sampling Transformation" src="http://www.bimonkey.com/uploads/componentreview/percentagesampling1.jpg" alt="b" width="252" height="69" /><p class="wp-caption-text">Fig 1: The Percentage Sampling Transformation</p></div>
<p>In this post I will be covering the Percentage Sampling Transformation. The sample package can be found<a title="SQL 2005 SSIS Aggregate Sample (Right Click, Save As)" href="http://www.bimonkey.com/uploads/componentreview/Aggregate%20Transformation%20Basics%202005.dtsx"></a> <a title="SQL 2005 SSIS Percentage Sampling Transformation Sample Package (Right click, save as)" href="http://www.bimonkey.com/uploads/componentreview/Percentage%20Sampling%20Transformation%20Basics%202005.dtsx">here for 2005</a> and guidelines on use are <a title="Using samples from BI Monkey" href="http://www.bimonkey.com/support/using-ssis-samples-from-this-site/">here</a>.</p>
<h2>What does the Percentage Sampling Transformation do?</h2>
<p>This component is very simple &#8211; it splits a dataset by randomly directing rows to one of two possible outputs (as you can see in example 2 in the package, you can use just a single output if you want). All you need to decide is in what proportion (as a whole percentage) you want the rows split into the two output data flows. In the picture below you see the configuration options &#8211; the percentage split, the names of the two outputs and the Random Seed.</p>
<div class="wp-caption alignnone" style="width: 400px"><img title="Percentage Sampling Transformation Options" src="http://www.bimonkey.com/uploads/componentreview/percentagesampling2.jpg" alt="b" width="390" height="318" /><p class="wp-caption-text">Fig 2: The Percentage Sampling Transformation Options</p></div>
<p>The effect of the Random Seed can be seen in the sample package &#8211; if you run it multiple times you will get different results for the split each time, as each time you run it the Random Seed is different because the package decides what it is based on the tick count of the operating system (and no, I don&#8217;t know what that is either!). Note that in the example even though the percentage sample is set to 30% it&#8217;s unusual for the output rows to be split exactly 30:70. This is because the rows are allocated to an output by a throw of the randomisers dice. If you set a value for the Random Seed you fix the results of the throws and will always get the same rows sent to the same outputs, though there is still no guarantee it will be 30:70.  As the data set you split gets bigger, the impact of this effect will be less significant.</p>
<div class="wp-caption alignnone" style="width: 619px"><img title="Percentage Sampling Transformation Results" src="http://www.bimonkey.com/uploads/componentreview/percentagesampling3.jpg" alt="b" width="609" height="147" /><p class="wp-caption-text">Fig 3: Percentage Sampling Transformation Results</p></div>
<h2>Where would you use this transformation?</h2>
<p>The main use for this as far as Microsoft is concerned is carving up data sets for Data Mining into training and test cases. But anywhere you need to divide a dataset truly randomly &#8211; e.g. separating out customers for a different target mailing &#8211; this is the component for the job.</p>
<p>MSDN Documentation for the Percentage Sampling Transformation can be found here for <a title="SQL 2008 Percentage Sampling Transformation Documentation on MSDN" href="http://msdn.microsoft.com/en-us/library/ms139864.aspx">2008</a> and here for <a title="SQL 2005  Percentage Sampling Transformation Documentation on MSDN" href="http://msdn.microsoft.com/en-us/library/ms139864(SQL.90).aspx">2005</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2009/06/the-percentage-sampling-transformation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Microsoft&#8217;s secret forecasting tool &#8211; the Office Suite</title>
		<link>http://www.bimonkey.com/2009/04/microsofts-secret-forecasting-tool-the-office-suite/</link>
		<comments>http://www.bimonkey.com/2009/04/microsofts-secret-forecasting-tool-the-office-suite/#comments</comments>
		<pubDate>Thu, 16 Apr 2009 23:51:32 +0000</pubDate>
		<dc:creator>BI Monkey</dc:creator>
				<category><![CDATA[Microsoft BI]]></category>
		<category><![CDATA[Access]]></category>
		<category><![CDATA[Analysis Services]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Office Suite]]></category>
		<category><![CDATA[Project Gemini]]></category>

		<guid isPermaLink="false">http://www.bimonkey.com/?p=98</guid>
		<description><![CDATA[Last night I attended an IAPA presentation on basic forecasting concepts and the tools used, presented by the ever interesting Eugene Dubossarsky (of Presciient, an analytics consultancy).  I will skip over the forecasting content as for the Microsoft BI community, the interesting part is which tool he used for most basic forecasting activities. It was [...]]]></description>
			<content:encoded><![CDATA[<p>Last night I attended an <a title="Institute of Analytics Professionals of Australia" href="http://www.iapa.org.au" target="_blank">IAPA</a> presentation on basic forecasting concepts and the tools used, presented by the ever interesting Eugene Dubossarsky (of <a title="Presciient is an independent consulting company that provides advisory and analytical services to businesses and government agencies" href="http://presciient.com/" target="_blank">Presciient</a>, an analytics consultancy).  I will skip over the forecasting content as for the <a title="Microsoft Business Intelligence" href="http://www.microsoft.com/BI/" target="_blank">Microsoft BI</a> community, the interesting part is which tool he used for most basic forecasting activities. It was <a title="Microsoft Excel" href="http://office.microsoft.com/excel">Excel</a>. Then, when he needed to do more advanced work, he used &#8211;  <a title="Microsoft Excel" href="http://office.microsoft.com/excel">Excel</a>. Only when he needed to do trickier stuff with larger amounts of data did he pull in a more heavyweight tool &#8211; <a title="Microsoft Access" href="http://office.microsoft.com/access">Access</a>.</p>
<p>That&#8217;s right &#8211; the office suite covers the majority of forecaster&#8217;s needs. <a title="SQL Server 2008 Overview, data platform, store data" href="http://www.microsoft.com/SQL/default.mspx">SQL Server</a> and <a title="Analysis Services" href="http://www.microsoft.com/sqlserver/2008/en/us/Analysis-Services.aspx">Analysis Services</a> didn&#8217;t get a look in until the really heavyweight analytics processes began. For his purposes however, Eugene much prefers <a title="The R Project for Statistical Computing" href="http://www.r-project.org/" target="_blank">R</a>, an open source stats program that is free, very powerful and now a serious competitor to <a title="SAS | Business Intelligence Software and Predictive Analytics" href="http://www.sas.com/" target="_blank">SAS </a>- much to their annoyance. Microsoft are rumoured to be talking to the people behind R, and an acquisition would make sense for both sides &#8211; R is not user friendly, which Microsoft could provide help with &#8211; and adding the capabilities of R would allow Microsoft to take a slug at SAS&#8217;s BI market.</p>
<p>So, this shows that most users <strong>still </strong>aren&#8217;t fully aware of, let alone using Excel&#8217;s capabilites &#8211; otherwise they wouldn&#8217;t be paying analytics consultants to to use it for them. Microsoft are always pushing Excel further, so now i&#8217;ll cover two features of Excel that the power users may not be aware of. It&#8217;s easy to forget sometimes that the 2007 Office suite wasn&#8217;t just a new, pretty interface &#8211; it also added huge BI capabilities.</p>
<p><strong>The Data Mining Add-In for Excel</strong> (download for SQL Server <a title="SQL Server 2008 Data Mining Add-In for Excel 2007" href="http://www.microsoft.com/downloads/details.aspx?displaylang=en&amp;FamilyID=896a493a-2502-4795-94ae-e00632ba6de7">2008</a> or <a title="SQL Server 2005 Data Mining Add-In for Excel 2007" href="http://www.microsoft.com/downloads/details.aspx?displaylang=en&amp;FamilyID=7c76e8df-8674-4c3b-a99b-55b17f3c4c51">2005</a>)</p>
<p>This Add-In allows you to leverage the <a title="Microsoft SQL Server 2008: Data Mining" href="http://www.microsoft.com/sqlserver/2008/en/us/data-mining.aspx" target="_blank">Data Mining</a> capabilities of Analysis Services through Excel. It allows you to use Excel as the front end for creating and working with Data Mining models that exist on your server. However what really makes it interesting for Excel users is that it allows you to perform Data Mining <em>on your spreadsheet data</em>.</p>
<p>There is a Virtual Lab <a title="MSDN Virtual Lab: Using the Microsoft SQL Server 2005 Data Mining Add-Ins for the 2007 Microsoft Office System" href="http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?culture=en-US&amp;EventID=1032346458&amp;EventCategory=3" target="_blank">here</a> explaining and demonstrating their use.</p>
<p><strong>Project Gemini</strong></p>
<p>This feature is slated for the next release of Excel, and is an in-memory tool for analysing large amounts of data in an OLAP style, but without all the fiddly data modelling normally required. It is a clear slug at other players in the in-Memory market, such as QlikTech. The models created will also be able to be ported back to SSAS with minimum effort as well. For more details read <a title="Project Gemini — Microsoft’s Brilliant OLAP Trojan Horse" href="http://www.olapreport.com/Comment_Gemini.htm">this commentary</a> from the OLAP Report.</p>
<p>Microsoft has one of the most powerful BI Tools in the world in Excel, users just need to be made aware!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bimonkey.com/2009/04/microsofts-secret-forecasting-tool-the-office-suite/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

