The APT of BI & Analytics

In the world of Information Security an Advanced Persistent Threat (APT)  “usually refers to a group, such as a government, with both the capability and the intent to persistently and effectively target a specific entity”.

I’ve written and tweeted and otherwise socialled the message about the threat automation is posing human cognitive labour. However one thing I’ve skipped over – despite through my career choice being an implicit part of – is the APT to human labour that the application of analytics within business represents.

Attempting to control labour productivity and costs have always been an important activity within any operation – more productivity per unit of labour at the lowest possible cost being the key aim (when was the last time you heard business groups advocating higher minimum wages?).

The Science of the Labour Analytics APT

BI & Analytics have enabled this to move from an art – i.e. “I, Bob’s manager, suspect he is a slacker, and should be fired” – to a science – i.e. “I, Bob’s manager, see he is costing more to employ than he generates in revenue, and should be fired.”  To people working in sales this is nothing new – targets and bonuses have long been part of the way their performance is measured (gleefully ignoring the evidence that this is counter-productive). Now however, everyone in the organisation can be assigned a “worth” which they must justify.

Now traditionally some components are more easily allocated value – sales people generate revenue, consultants can be sold – but areas that have been harder are starting to fall into a metricisable state. For example, through analytics of customer satisfaction, it can be worked out which aspects of service – billing, support, service levels – actually matter. Then consequently what the business should spend to get that function delivering the customer satisfaction to keep the customer. If support doesn’t really matter, don’t ask for a payrise if you work in that department.

Its not all dark side, of course – part of the metricisation of labour has meant that some improvements to working life have come along. The realisation that happy employees are more productive has led to companies paying more than lip service (read: obligatory Christmas party and awkward team-building events) to keeping people happy and feeling like their work is worth expending effort on. So we can all look forward to less beatings.

Analysts are the Architects of Unemployment

It may be a bit harsh to suggest this, but I believe that alongside the roboticists, software developers, visionaries and other people building our future, analysts are a key player in removing human labour from daily life. At the futurologists end of the deal they are designing the learning systems which will allow machines to think, but in the here and now they are creating the basis for working out what sections of business need to be automated first.

At least the good news is that as an Analyst you will probably be one of the last to be fired….   by the HR algorithm.

The Good Little Robot

Courtesy “The Good Little Robot” – http://www.thegoodlittlerobot.com/

 

Adventures in context with MDX

One of the key concepts in MDX is context – that a calculation’s results are affected by the slicers that are applied to the portion of the cube the calculation works with. Today I had some fun with getting the context right in a query through subcubes and where clauses, so here’s some useful lessons. Note the below examples run against the “Internet Operation” cube of the AdventureWorks tabular model.

Subcubes do not give implicitly context to created members

If you use a subcube to constrain your scope, as below, it doesn’t affect the scope of the calculated members. So the below query, which creates a subcube by the use of the SELECT FROM a secondary SELECT at the bottom:

WITH

MEMBER Measures.Context AS [Geography].[Geography].CurrentMember.UNIQUE_NAME

SELECT
{
Measures.Context
}
ON COLUMNS,

{
[Date].[Calendar].[Year]
*    [Date].[Month Name].Members
}
ON ROWS

FROM

// Subcube

(
SELECT
(
[Geography].[Geography].[Country Region].&[Australia]
,    [Date].[Calendar].[Year].&[2008]
)    ON COLUMNS

FROM [Internet Operation]
)

Gives a set of results like this:

MDX Subcube results

MDX Subcube results

So note that despite the filter in the Subcube to Australia, the calculated member is not aware of this scope limitation, and so it believes the CurrentMember of the Geography Dimension is the default “All” level.

So, to give it the intended context, you need to include the Geography dimension on the rows, like below:

WITH

MEMBER Measures.Context AS [Geography].[Geography].CurrentMember.UNIQUE_NAME

SELECT
{
Measures.Context
}
ON COLUMNS,

{
[Geography].[Geography].[Country Region]
* [Date].[Calendar].[Year]
* [Date].[Month Name].Members
}
ON ROWS

FROM

// Subcube

(
SELECT
(
[Geography].[Geography].[Country Region].&[Australia]
, [Date].[Calendar].[Year].&[2008]
) ON COLUMNS

FROM [Internet Operation]
)

The addition of the Geography Dimension gives the calculated member the context to operate in, and the results are filtered to Australia by the filter in the subcube:

MDX Subcube results

MDX Subcube results

Using a WHERE does give context to calculated members

Alternatively, using a WHERE clause does give context to calculated members:

WITH

MEMBER Measures.Context AS [Geography].[Geography].CurrentMember.UNIQUE_NAME

SELECT
{
Measures.Context
}
ON COLUMNS,

{
[Date].[Month Name].Members
}
ON ROWS

FROM [Internet Operation]

WHERE
(
[Geography].[Geography].[Country Region].&[Australia]
, [Date].[Calendar].[Year].&[2008]
)

As we can see from the results – we haven’t specified geography on the rows, just filtered in the WHERE – but the calculated member still has the context for the Geography dimension:

MDX WHERE results

MDX WHERE results

Why is WHERE different from a Subcube?

The WHERE has a different effect to the Subcube because the WHERE clause sets the context for the Geography Dimension before the calculated members are evaluated. The Subcube only sets the context when the calculated members are evaluated, hence the different results depending on whether the context is provided at runtime (as in the second example) or not at all (as per the first example).

Passing Parameters to MDX Shared Datasets in SSRS

So part of my ongoing punishment work involved making some changes to a SSRS report that was running off some MDX, querying a Tabular  Model. It needed Parameters to feed Shared Datasets…  and the two had to talk nicely. This was not as easy at is seems…

First of all  big warning to anyone wanting to mix DAX and MDX on the same report: the two won’t talk well to each other. So in the below examples, whilst the DAX queries to get the parameters would have been relatively easy, they ultimately had to feed an MDX query (as that was what was pre-existing). This is a problem as Member Names are not in the same syntax for DAX and MDX so you can’t easily feed a selected Member from a DAX resultset into an MDX operator like STRTOMEMBER.

So, what we will do in this post is set up a simple report that has a user selectable parameter in the report that affects a shared dataset. Note the below examples run off AdventureWorks 2012.

Setting up a Parameter from MDX

Setting up an MDX Parameter to be usable is a little adventure in itself.  In order to be a Parameter that feeds into an MDX query, you need to pick up some additional Member properties along the way. So while this may get the members you want on the rows:

WITH
MEMBER    Measures.[Dummy] AS NULL

SELECT
{
[Measures].[Dummy]
}
ON COLUMNS,
{
[Date].[Calendar].[Calendar Year].Members
}
ON ROWS

FROM [Adventure Works]

It won’t play nicely when you try to use the member name in a function such as STRTOMEMBER it as it doesn’t return the Unique Member Name, which that function needs to work properly. So you need to write something like this:

WITH
MEMBER    Measures.ParameterCaption AS [Date].[Calendar].CURRENTMEMBER.MEMBER_CAPTION
MEMBER    Measures.ParameterUniqueName AS [Date].[Calendar].CURRENTMEMBER.UNIQUENAME
MEMBER    Measures.ParameterLevel AS [Date].[Calendar].CURRENTMEMBER.LEVEL.ORDINAL

SELECT
{
[Measures].[ParameterCaption]
,    [Measures].[ParameterUniqueName]
,    [Measures].[ParameterLevel]
}
ON COLUMNS,
{
[Date].[Calendar].[Calendar Year].Members
}
ON ROWS 

FROM [Adventure Works]

What this adds by returning these member properties is some values that can be used more effectively in SSRS parameters. The MEMBER_CAPTION gives the Member name for display,  UNIQUENAME gives the unique member name in the hierarchy so it can be used in downstream MDX calculations reliably and finally the level ORDINAL number can be used if depth in the hierarchy is important.

So now, to set it up in SSRS. I won’t go into too much detail – you can get most of this from the sample, but essentially create a shared dataset with the MDX for the parameter, add a report, then within the report create a dataset that uses the shared dataset. Then, we add a parameter and set it up as below (click on the images to see the full screen – the screen grabs were too big).

First, give it a name, a user friendly label and allow multiple values (if you want)

SSRS MDX  parameter setup properties

SSRS MDX parameter setup properties

Then, we point the parameter to get values from the shared dataset we created for the parameter. Here we use the additional properties we added in the MDX query.  We use the member UNIQUENAME as the Value field and the CAPTION as the Label field. The Value field is the actual data that will be passed by the parameter to other datasets.

SSRS MDX  parameter setup available values

SSRS MDX parameter setup available values

Finally, to keep things clean, we use the same dataset to provide the default values for the parameter.

SSRS MDX  parameter setup default values

SSRS MDX parameter setup default values

.. and we’re done. Now to connect it up with the Shared Dataset properly.

Parameterise a Shared Dataset

First, we need to create a parameterised shared data set. So we will add a Shared Data Set with a parameter in the text of the query, like below:

SELECT NON EMPTY
{ [Measures].[Order Count] }
ON COLUMNS,
NON EMPTY
{ ([Ship Date].[Calendar Year].[Calendar Year].ALLMEMBERS ) }
ON ROWS
FROM [Adventure Works]

WHERE STRTOSET(@CalendarYears)

Note the parameter for the query is called out by preceding it with the “@” symbol.

Now it is important to add the query in the right place to get the parameterisation to work. Don’t use the function button and add the query directly. Go to the “Query Designer” option

Define a Shared Data Set Query

Define a Shared Data Set Query

 

Within the Query Designer, click the “Design Mode” option (the triangle on the far right):

Shared Data Set Query Designer

Shared Data Set Query Designer

And then paste the query in there:

ssrs_mdx_parameter_setup_3c

Then you need to click on the  “Query Parameters” option -  5th from the right (the toolbar icons are minuscule at 1920×1080), and set up the parameter – this time, specify the name without the @ symbol:

Shared Dataset Parameter setup

Shared Dataset Parameter setup

This involves naming the parameter and tying it to a Dimension, Hierarchy and optionally default value(s). After that  click ok, run the query to test, and if all is OK, click on OK to close the designer.

Adding the parameter to the query in this way automatically adds content to the Parameters section of the Shared Dataset – note  the parameter name here is not preceded by an “@” symbol:

Shared Dataset Parameter section

Shared Dataset Parameter section

And you’re done.

Side warning about trying to set up parameters directly in the Shared Dataset properties:

One thing to be careful of is that doing parameter setup directly in the Shared Dataset properties, instead of via the Query Designer Parameter set up is that doing so adds a default value to the parameter which may throw this error when you run the report.:

An error occurred during local report processing.
The definition of the report ‘[Report Name]’ is invalid.

An unexpected error occurred while compiling expressions. Native compiler return value: ‘[BC40000] ‘RequestMinimum’ is obsolete: ‘Assembly level declarative security is obsolete and is no longer enforced by the CLR by default. See http://go.microsoft.com/fwlink/?LinklD=155570 for more information.’.’.

I haven’t tested any other context but this was using SSDT as part of VS2012. The reason is the default parameter format looks like this:

=new Object() {[Date].[Calendar Year].&[2008], [Date].[Calendar Year].&[2009], [Date].[Calendar Year].&[2007]}

Which causes errors at runtime. It seems to be impossible to correct as well – you can edit it but it always reverts back to the created value. So make sure you follow the steps above – as a reminder:

  1. Create a shared dataset
  2. Create the query in Query Designer mode
  3. Set up the parameter via the Query Designer “Query Parameters” window

Parameterise the Report

Next we need to set all this up in a report. Add a new dataset to the report that refers to the shared dataset we just created, and put a Tablix on the report that uses the Shared Dataset as a source.

Because the source Shared Dataset is parameterised, the report dataset will carry that parameter across:

Report Dataset Parameters

Report Dataset Parameters

It will map the Parameter Name from the query (without an “@”) to the  Parameter Value available from the report (with the “@”). Now – because planning – I named the parameter created in the first section the same as that for the Shared Data set, it has auto-magically mapped correctly.

Now, if you execute the report and set some values for the parameter, the parameter values will successfully pass via the report to the Shared Data Set and affect the results it passes back to the report.

There you go. Not the most obvious thing in the world, and there’s lots of ways to get it wrong – especially by putting @ symbols in the wrong place – but if you follow the pattern laid out in this post, you should avoid too much trouble.

The sample project can be downloaded here: Shared Dataset Parameters Example Project

 

 

Adventures in DAX, a mild grumble

Being back in the field is proving to be interesting as I get back into the daily frustrations and challenges of development (as opposed to those of PowerPoint). Anyway, one of the new things I’m hands on with is and Tabular Models and DAX table queries, and I’m feeling a little underwhelmed.

Syntax

I’m now torn with whether to hate DAX more than MDX. It seems that in the design of DAX table queries, any opportunity to make it accessible was largely ignored. What do I mean? Well…  here’s how to get all the rows from a given table in a Tabular Model:

EVALUATE
(
‘A Fact Table’
)

Ugh…   “EVALUATE”? I’m sure it makes sense to someone, but even MDX managed to have “SELECT”.
But then hiding in there are some weird syntactic quirks at the simplest level. Say I want to SUM a given column, I do this:

EVALUATE
(
SUMMARIZE
(
‘A Fact Table’,
‘A Fact Table’[My Measure]
)
)

So setting aside the Americanisation of SUMMARIZE, the above only actually works if the Measure is in the base table. If its a calculated column, you have to give it an alias to be able to access it, i.e:

EVALUATE
(
SUMMARIZE
(
‘A Fact Table’,
“My Alias”, ‘A Fact Table’[My Measure]
)
)

So that’s just annoying, as the error message if you don’t use an alias simply implies the measure isn’t there.

Usability

That’s of course just the small stuff. There’s the bigger challenges.

SSMS has no DAX query editor, so you need to use an MDX query template. And then in that there’s no Intellisense and the Metadata panel is useless as it is for querying your tabular model in MDX mode.

SSRS can only cope with DAX queries if you pretend you are writing a DMX query.

I’ve yet to interact with it via SSIS but I doubt the experience is much better.

The Upside

Anyway, it’s not all bad. To get started, take a look at these posts by Chris Webb (don’t fall over yourself looking at the official documentation though, it’s a bit sparse).

The language looks pretty powerful and getting sets of data out of it is simpler than MDX. However it’s not as friendly as it could be and it’s lacking some basic support as part of day to day usability. Hopefully this will improve by SQL2016…

Troubleshooting Service Broker Stored Procedures

A quick post on troubleshooting the Stored Procedures that can get called to handle the messages in a Service Broker queue.

The stored procedures are pretty vanilla – see step 4 in this link for an example – take the XML of the message body and process it within the confines of the queue.

However the procedures cant be executed in the query editor so they only log their output to the error log – so to debug you need to look in the main SQL Server log – found at somewhere like “C:\Program Files\Microsoft SQL Server\MSSQL11.SQLSERVER\MSSQL\Log\ERRORLOG” – openable with a text editor of your choosing.
Then hunt for lines that look like this:

2013-11-08 10:56:26.10 spid34s     The activated proc ‘[dbo].[PayloadReceive]‘ running on queue ‘ServiceBrokerDatabase.dbo.InboundQueue’ output the following:  ‘Some useful info’

To break that down, you get timestamp, process id, and then a message that says effectively “stored procedure” on “queue” output “some stuff”. So the “some stuff” could be an error message output by SQL, or a PRINT statement in your proc as you try to test.

SQL PASS DW/BI VC on Youtube!

Enough acronyms for you?

SQL PASS (Professional Association of SQL Server) runs what they call “Virtual Chapters” – that provide online training and activity – I presented to one last year as a practice run for my TechEd session. The BI/DW chapter is run by local SQL MVP Julie Koesmarno (perhaps some of you know her as Ms SQL Girl).

Anyway – the VC has now caught up with this wacky Youtube fad thing and got themselves a Youtube Channel. There’s a couple of videos I’m going to get stuck into sharpish – the Data Vault (subject for future blog post) and BIML (same) ones. There’s lots more and plenty to come, so bookmark it as a new and useful resource.

So thanks Julie – I’ve added a link to her blog on the blogroll & to the YouTube channel under training resources.

 

Is ETL Development doomed?

A slightly dramatic title, but over the last few months I’ve been exposed to a number of tools that will provide a strong layer of automation to ETL development, eliminating a lot of the basic need for ETL developers to shift data from System A to Database B and beyond.

I’ve been hands on with a couple:

And also heard about a couple more:

… and I’m certain this list is not comprehensive. The significant takeaway is that software build automation in the BI world is starting to catch up with where other software languages have already been (Coded a website lately? many IDE’s do most of the hard work for you now). Much as IDE driven tools such as DTS, SSIS and so on moved us away from hand coding SQL and wrapping up those scripts, the latest round of tools are moving us away from the IDE’s where we drag and drop our flows.

How will ETL careers be killed?

There seems to be a couple of tracks for this. First is the pure development automation tools, such as Varigence MIST. If you are technically minded, take a look at this product demo video – though I suggest skipping to about 25 minutes in to see the real meat as it does go on a bit. It looks mindbogglingly powerful but is clearly shooting at the ETL pro who wants to churn stuff out faster, more consistently and with less fiddling about. MIST is limited to SSIS/AS (for now) and I’m not sure how far it will go as it’s clearly aimed at the developer pro market, which is not always the big buyers. I expect to be playing with it more over the next few weeks on a live project so should be able to get a better view.

The second path appears to be more targeted at eliminating ETL developers in their entirety. AnalytixDS wraps up metadata import (i.e. you suck in your source and target metadata from the systems or ERWIN), do the mapping of fields and apply rules, then “push button make code”. Obviously there’s a bit more to it than that, but the less you care about your back end and the quality of your ETL code (cough Wherescape cough) the more likely this product will appeal to you. Say hello, business users, who are the big buyers (though I look forward to troubleshooting your non-scalable disasters in the near future).

What’s the diagnosis, doctor?

Long term, the demand for ETL skills will decline on the back of these tools. Simple ETL work will simply go away, but the hard stuff will remain and it will become an even more niche skill that will pay handsomely – though you may spend more time configuring and troubleshooting products than working with raw code. Which class of tool dominates is uncertain, but I’m leaning towards the business oriented mapping tools that completely abstract away from ETL development altogether.

If you’ve had any experience with these tools, I’d love to hear about them in the comments.

BI Monkey goes independent!

After many years working as a consultant for a number of providers small and large, and servicing clients across a broad range of industries, I have now taken the plunge and decided to operate independently.

I’ll be providing independent Business Intelligence Consulting to organisations across Australia, focusing on:

  • Strategy Creation and Review
  • Solution Architecture
  • Data Warehousing and ETL
  • Microsoft Business Intelligence technical support
  • Microsoft Business Intelligence training
  • Agile Enablement

Full details can be found at my “Consulting and Technical Services” page

Some of this decision was supported by some reading of James Serra’s posts from his master post “Blueprint for consulting riches” – which is ironic given his recent move to Microsoft as an F/T employee. Either way, his series is well worth a read for those mulling over their approach to work.

Gartner 2014 Magic Quadrant for Business Intelligence

The Gartner Magic Quadrant for Business Intelligence and Analytics Platforms is now available.

Good news for Microsoft again – it remains in the Leaders quadrant, though in line with all other MegaVendors has slipped a bit due to a weak story around data discovery. It still remains a well loved platform by both users, developers and architects and is showing increasing levels of being the standard enterprise product. For those of us working the the field it remains a safe bet from a career point of view for a good few years yet.

On the downside there are the same bugbears we are still complaining about – no credible mobile story, metadata management is non-existent (hello Project Barcelona – no news for 2 years now?)  and PowerView, while shiny, is no match for the likes of QlikView or Tableau (regardless of how ugly they are behind the shiny screens, that’s what the users see and judge you on).

Anyway, not too shabby a report card, a decent score but the usual caveat of “could try harder”. But the other big kids (IBM Cognos, SAS, Oracle) are doing pretty much the same so not much to worry about.

PowerPoint and productivy

Dilbert 24/12/2005

Eight slides of productivity! Impressive!

 

An in joke for one of my fellow leaders in the BI industry…

 

 

Next Page »