PowerPivot Succinctly – new eBook by me!

Want to learn more about PowerPivot? Syncfusion have published a new eBook “PowerPivot Succinctly” written by my good self.

A quick overview of the contents:

  1. PowerPivot Model Basics
  2. Using your PowerPivot Model
  3. Sharing your PowerPivot Model
  4. A Note on Instability
  5. Deep Dive: The xVelocity Engine

It’s a free download – so go help yourself!

Read More

Issuing a Hive query against Hadoop on Azure using Excel

..no, seriously. I can query an Azure cloud based Hive instance of Hadoop from Excel. Its simple stuff.

First step is to install the Hive drivers and Excel add-in. This then gives you a new button on your Excel ribbon:

The Hive Excel Add-In
The Hive Excel Add-In

Clicking this opens up the query pane, which is pretty simple stuff:

Hive Excel Add-in Query Pane
Hive Excel Add-in Query Pane

Pick your Hive instance, choose the table (in this case just the standard sample table), pick your columns…  then scroll down a bit because the pane is a bit long when you expand out the options.

Hive Excel Add-in Query Pane
Hive Excel Add-in Query Pane

I passed on providing any Criteria (i.e. the WHERE clause), added in some Aggregations, skipped over ordering as I can do that easily enough in Excel, added a 2k row limit.

This spat out some HiveQL (Hive Query language) which I modified slightly to include a count(*). Then click Execute Query, and wait a bit for the Big Data Bees to make my Insights.

Then:

Hive Data in Excel
Hive Data in Excel

I have Hive Data in Excel. I could have put it in PowerPivot if I really wanted to show off…

Read More

The Excel Source and Connection Manager – The basics

In this post I will be reviewing the Excel Source and Excel Connection Manager. The sample package and files can be found here for 2008 and here for 2005 and guidelines on use are here.

t2
Fig 1: The SSIS Excel Source

How do you read data from an Excel Workbook in SSIS?

The answer to that is, it depends on the version of Excel . If it is 2003 or earlier, you can use the the Excel Connection Manager and Excel Source. If it is 2007 or later, you use a specially configured OLE DB Connection Manager, as described in the MSDN article How to: Connect to an Excel Workbook. An example of this is in the sample package (Data Flow 4). Though the sample package for this post includes an example of each version of Excel, but I will only be discussing the Excel source for 2003 and earlier from here on.

Configuring the Excel Connection Manager

The Excel connection manager is pretty simple to set up – all it requires is the file path, Excel version, and to know whether it has Column Names in the first row.

t2
Fig 2: The Excel Connection Manager

If you look at the Properties of the Connection Manager once it is set up, you can see a Password field – this is misleading – you cannot connect to a password protected workbook. So, if you have to connect to a secure workbook you need to look at either other means of extracting that data or alternative security for the workbook.

Configuring the Excel Source

The Excel Source is very similar to the OLE DB source. This can initially be confusing as in the Data Access Mode drop-down it talks in terms of Tables, Views and SQL Commands. When it says Table or View, what it means in Excel speak is Sheets and Named Ranges. When it talks in terms of SQL – it really means it. You can construct SQL statements to pull restricted amounts of, or modified versions of the spreadsheet data. An example is below:

t2
Fig 3: SQL in the Excel Source

Just remember to qualify the Sheet / Range Name with square brackets – e.g. [Sheet1$]- if hand writing code. Examples of each type of access is available in Data Flows 1 – 3 in the sample package.

Summary

Use the Excel Source and Excel Connection manager when reading from workbooks from Excel 2003 and prior. Be aware the driver behind it can behave unexpectedly at times, and it is worth paying attention to the “Usage Considerations” section of the MSDN documentation if you are having unexpected results.

Documentation for the Excel Source can be found here for 2008 and here for 2005. Similarly Documentation for the Excel Connection Manager can be found here for 2008 and here for 2005.

Read More

Kilimanjaro, Projects Gemini and Madison Webcast

For those who haven’t seen much of Project Gemini but have heard the buzz, this TechNet Webcast: An Early look at SQL Server ‘Kilimanjaro’ and project ‘Madison’ – will give you a good insight. It also has some features on reusable Reporting Services components which look very impressive and info on Project Madison, which provides scalability features. Registration as usual is a pain, and forget trying to use the site using any browser other than IE – I wish Microsoft would make their content easier to access.

Anyway, onto the webcast – about the 1st quarter of the webcast is the usual generic roadmap blurb, but then the presenter gets into the real meat of Gemini – an Excel based ‘in memory’ analysis tool that allows joining between entities without having to know about such things, superfast analytics – pivoting, calculation, charts etc. and then being able to publish to Sharepoint. From an OLAP analysis point of view, the Pivot Tables also has slicers (effctively table wide filters) displayed in the spreadsheet as well, and it would be good if that made it into Pivot Tables generally in the next release of Office. It looks like an incredible tool and very easy to use – and may be a powerful step towards the realisation of the ‘BI for the masses’ vision. The presenter did let slip one weakness though – much has been made of the 100 million rows of data demo – but that data still has to be loaded into memory first and will still take significant time. I also suspect that how successful Gemini will be is going to depend on how much it will rely on good data structures being in place in an organisation to support it. The Data Warehouse is going to remain the core part of  any succesful BI delivery.

The next component of interest was the reusable Reporting Services components – there is the concept of a library of components that can be built – e.g. standard charts, logos, gauges etc – and then dropped into any report, either by a developer or a user in Report Builder 2.0.  What really grabbed my attention is that these components are version aware – i.e. if the library version of the component is updated then if you reopen the report in design mode it will let you know and give you the option to update. Again this points to ‘BI for the masses’ as you can have developers create some great components which any user can then drop in to their home grown reports. Plus as any developer knows, there’s a lot of repetition and any options for code re-use are always appreciated.

Finally, Project Madison was covered – and seems more about scalability up to multi-terabyte warehouses. It was a bit infrastructure focused for me so most of it passed me by, but clearly Microsoft are stepping up to try and address the market perception that they can’t scale.

Anyway, this all will be dropped in late 2010 as Kilimanjaro – an interim release of SQL Server.

Read More

Microsoft’s secret forecasting tool – the Office Suite

Last night I attended an IAPA presentation on basic forecasting concepts and the tools used, presented by the ever interesting Eugene Dubossarsky (of Presciient, an analytics consultancy).  I will skip over the forecasting content as for the Microsoft BI community, the interesting part is which tool he used for most basic forecasting activities. It was Excel. Then, when he needed to do more advanced work, he used – Excel. Only when he needed to do trickier stuff with larger amounts of data did he pull in a more heavyweight tool – Access.

That’s right – the office suite covers the majority of forecaster’s needs. SQL Server and Analysis Services didn’t get a look in until the really heavyweight analytics processes began. For his purposes however, Eugene much prefers R, an open source stats program that is free, very powerful and now a serious competitor to SAS – much to their annoyance. Microsoft are rumoured to be talking to the people behind R, and an acquisition would make sense for both sides – R is not user friendly, which Microsoft could provide help with – and adding the capabilities of R would allow Microsoft to take a slug at SAS’s BI market.

So, this shows that most users still aren’t fully aware of, let alone using Excel’s capabilites – otherwise they wouldn’t be paying analytics consultants to to use it for them. Microsoft are always pushing Excel further, so now i’ll cover two features of Excel that the power users may not be aware of. It’s easy to forget sometimes that the 2007 Office suite wasn’t just a new, pretty interface – it also added huge BI capabilities.

The Data Mining Add-In for Excel (download for SQL Server 2008 or 2005)

This Add-In allows you to leverage the Data Mining capabilities of Analysis Services through Excel. It allows you to use Excel as the front end for creating and working with Data Mining models that exist on your server. However what really makes it interesting for Excel users is that it allows you to perform Data Mining on your spreadsheet data.

There is a Virtual Lab here explaining and demonstrating their use.

Project Gemini

This feature is slated for the next release of Excel, and is an in-memory tool for analysing large amounts of data in an OLAP style, but without all the fiddly data modelling normally required. It is a clear slug at other players in the in-Memory market, such as QlikTech. The models created will also be able to be ported back to SSAS with minimum effort as well. For more details read this commentary from the OLAP Report.

Microsoft has one of the most powerful BI Tools in the world in Excel, users just need to be made aware!

Read More