Using SUMMARIZE to get a Maximum Per Day in a Period

A quick snippet on the following scenario. I had daily data of users sessions in a system and wanted to know in a given period what the maximum number of unique users on a single day was. My data looked like this:

Data sample
Data sample

A user could connect multiple times per day, so I needed a DISTINCTCOUNT to get unique users. However for a given period I needed to know this per day. So for the period I needed to calculate the number of unique users per day in that period – which meant I needed to create an interim table using SUMMARIZE.

This resulted in this construction:

SUMMARIZE(Logs,[Session Date],”UsersPerDay”,DISTINCTCOUNT([User]))

“Logs” is my source data table. “Session Date” is what I am grouping my table by to get the results per day. The context of the period I am looking at (be it year, month, quarter, whatever) is managed by the date filters I apply to the table. “UsersPerDay” is just the name I assign to my measure, which is the DISTINCTCOUNT of the User field.

What I will end up with is an interim table which has – per day – the number of distinct users. Though it will not be materialised, in memory it would look like this:

SUMMARIZE result
SUMMARIZE result

Then, to get the Maximum in a day for a period, we just need the MAX of the the UsersPerDay in this table. As it’s an expression, we lean on MAXX:

MaxUsersPerDay:=MAXX(SUMMARIZE(Logs,[Session Date],”UsersPerDay”,DISTINCTCOUNT([User])),[UsersPerDay])

And there we have it! Note in MAXX the Expression we use to get the MAXX of is our custom “UsersPerDay” we created in the SUMMARIZE function. Intellisense won’t pick this up as it’s not part of the model but the formula works just fine.

Read More

Managing complex security in SSAS Tabular – Yeah Nah edition

A couple of times recently I have come up against requirements which have required some fairly complex logic to apply security. One involved some fairly gnarly relationships coming from multiple directions, the other involved grinding through Hierarchies from parent nodes down to permitted viewable children.

The problem with both cases is that though the logic can sometimes be written (albeit usually in an ugly as hell manner) – the functions needed to do so perform atrociously. For complex relationships you are obligated to take in context after context, changing filters and doing all sorts of DAX voodoo. As we know by now, avoiding relationships is good for performance. Hierarchies can be managed through the PATH function, but it’s a text operation that is far from speedy.

Let’s give a quick example of some complex security – consider the below data model:

Complex Model
Complex Model

Here the security controls who can see what has been spent on a Task in the FactTable object. How can see what depends on their Role and/or the Unit they are in. There is also a 1:many relationship between a person and the login they can use.

So for dynamic security you need to navigate from the User Id to the Person and assess what Unit they are in for the Unit based permissions. You also need to assess what Role they are in to get the Role based permissions.

I took one look at this and shuddered at the messy DAX I was going to have to write, plus how terribly it would perform.

Do it in the Cube? Yeah Nah.

So I thought “Yeah nah” and decided the cube was the wrong place to be doing this. Ultimately all I was trying to get towards was to pair a given login with a set of tasks that login would have permissions against. This is something that could easily be pushed back into the ETL layer. The logic to work it out would still be complex, but at the point of data consumption – the bit that really matters – there would be only minimal thinking by the cube engine.

So my solution enforces security through a role scanning a two column table which contains all valid pairings of login and permitted tasks to view. Very fast to execute when browsing data and a lot easier to code for. The hard work is done in loading that table, but the cube application of security is fast and easy to follow. The hierarchy equivalent is a pairing of User Id with all the nodes in the Hierarchy that are permitted to be seen.

As a final note, for those non-Aussie readers the expression “Yeah nah” is a colloquialism that implies that the speaker can’t be bothered with the option in front of them. For example: “Do you want a pie from the Servo, Dave?” “Yeah nah.”

Read More

Dynamic Time Variant calculations in DAX

A common requirement in any set of calculations is to create a range of time variants on any measure – Prior Period, Year to Date, Prior Year to Date, Prior Quarter…   you think of a time slice and someone will find it useful.

However the downside to this is that in the model you end up maintaining lots  of calculations that are all largely doing the same thing. Any good coder likes to parameterise and make code reusable. So how could we do this in Tabular? There is a way that is a very specific variant of the idea of Parameter Tables

Disconnect your Dimensions!

Step one is to unhook your Date Dimension from your fact table. This may seem counter-intuitive, but what it frees you to do is to use the Date dimension as a source of reference data that doesn’t filter your data when you select a date – this simplifies the subsequent calculations significantly. You also need to add to the date dimension all the dates you will need to perform your calculations – Year starts, Prior Year starts, Period starts etc. – this isn’t compulsory but you’ll be grateful later on when you need these dates and don’t have to calculate them on the fly, trust me. Your Date table (I’m going to cease calling it a Dimension, it isn’t any more) will end up looking something like this:

Date Table
Date Table

In practice you would hide all the columns apart from the Date as this is the only one that actually gets used by users.

Time for the Variants

Next, we need to create a simple filter table to apply the Time Variant calculations. All it needs is a numeric identifier per variant and a variant name, like so:

Variants Table
Variants Table

This – quite clearly – isn’t the clever bit. The thing to observe with all of these variants is that they create a date range. So what we need to do is calculate the applicable Start and End dates of that range. This is the bit where we are grateful we pre-calculated all those in our Date table. We add two Measures to the table, StartDate and EndDate, which detect which Time Variant is being calculated and then work out the appropriate date, based on the currently selected date. The DAX for StartDate looks like this:

StartDate:=
SWITCH(MIN([VariantID]),
1,MIN(Dates[PeriodStart]),
2,MIN(Dates[PriorPeriodStart]),
3,MIN(Dates[YearStart]),
4,MIN(Dates[SamePeriodPriorYearStart]),
5,MIN(Dates[PriorYearStart])
)

We use a SWITCH statement against the VariantID to detect which Variant we are trying to get the date range start for, then pick the right date from the Date Table. Pre-calculating these in the Date table keeps this part simple.

Add it all up

The final part is to pull these dates into the measure:

TotalTransactionAmount:=SUMX(CALCULATETABLE(Transactions,DATESBETWEEN(Transactions[TransactionDate],[StartDate],[EndDate])),Transactions[TransactionAmount])

This works by using the DATEBETWEEN function to apply a custom date range filter to the Transactions table – which we create dynamically through our StartDate and EndDate calculations.

Our end result:

Time Variant Results
Time Variant Results

We can see above that we can for a single selected date, generate a range of Start and End dates and apply those to our single summarising function to create multiple Time Variations.

The sample workbook is here: DAX Time Variants

Read More

PowerPivot calculated columns are not dynamic

A quick and dirty one – in attempting some clever dynamic security modelling in DAX I was warned about a gotcha in my design – that calculated columns were only evaluated when the model processed, so any approach based on calculated columns was doomed to failure. I didn’t quite believe it so I decided to do a road test in PowerPivot. Using a simple data set of one Dimension and one Fact, like below:

Simple Data Model
Simple Data Model

I hooked them up in PowerPivot with a relationship between “Column” on both tables. Then, using the ISFILTERED() function I created a couple of calculations. One, at Row Level, that would return a value of 1 if I filtered on the Attribute column:

=IF(ISFILTERED(Dim[Attribute]),1,0)

Which I then put a SUM on top of. I also added one at measure level, perfoming a similar function:

FilterCheckAsMeasure:=IF(ISFILTERED(Dim[Attribute]),1,0)

Then I created a Pivot Table checking the results, and got this:

Results
Results

The takeaway being that filtering on the Attribute was picked up by the table level measure, but the calculated column did not change value.

You can have a check of this yourself in my sample workbook: DAX Calculated Columns Evaluation Order.

What does the documentation say?

Well, nothing terribly clear, but in this page there is this small paragraph:

When a column contains a formula, the value is computed for each row. The results are calculated for the column as soon as you create the formula. Column values are then recalculated as necessary, such as when the underlying data is refreshed.

It doesn’t say exactly when “recalculated as necessary” is necessary, but the implication is that it’s a model level change, rather than the switching of context, or in my case the changing of the user viewing the model.

So in summary, we have to assume that our calculated column values are fixed upon initial calculation, formula changes or data load (or processing in a Tabular model) and there isn’t a way to make the value in a given cell change.

Read More

Dynamic Parent Child Security using DAX

I was recently posed a challenge to implement the following requirements in a Tabular model:

  1. Secure a Parent Child dimension at a Parent level, allowing all children to be visible
  2. Allow multiple Parents to be securable to a given user
  3. Keep it as easy to administer as possible

The first 2 requirements are technical and internal to the tabular model, but the last implied inherently that use of Roles and AD Groups was out as this is not a very user friendly approach.

Note that in this exercise I will be using the DimEmployee Dimension from the AdventureWorks 2012 DW sample database.

Row Level  Security in Tabular Models

Tabular Models apply security is applied at the row level by the application of row filters that restrict what a given Role can see. You can make the effect of a Roles dynamic through the use of  lookup tables and the USERNAME() function. There is a detailed white paper on securing Tabular Models on MSDN: Securing the Tabular BI Semantic Model

In a non-parent child situation this is simple to manage as it is easy to establish a relationship between viewable members (i.e. rows) and users via a bridging table – the paper mentioned above has a clear example of this.

How Parent Child Dimensions complicate things

Parent Child Dimensions add complications because there isn’t a straightforward relationship to manage. You can secure the parent by filtering on the parent’s unique key – as below – but any child row, while aware of the relationship to the  immediate parent via the ParentEmployeeKey cannot be filtered on the same column.

DimEmployee in AdventureWorks
DimEmployee in AdventureWorks

A consequence of this is that if you want to secure using row filters, you would need a table that secures each parent and child explicitly for each user. This is obviously not easy to administer and if the hierarchy changes you need to rebuild your security all over again.

How can I make Security Dynamic?

DAX, of course! This is a variation on the scenario described in the white paper – except in this case complicated because the table to be secured contains no data that can be used to apply security directly.

How do I apply this to a Parent-Child Dimension?

Fortunately there are ways to make rows aware of the hierarchy that applies to them. Of particular relevance in our case are the PATH and PATHCONTAINS function.

The PATH function returns a text delimited list of hierarchy members from the current member to the topmost parent. The PATHCONTAINS function scans that list for a specific member and returns a boolean TRUE/FALSE result as to whether that specific member is there. So if we have the following functions in our model:

PATH:=PATH([EmployeeKey],[ParentEmployeeKey])

PATHCONTAINS:=PATHCONTAINS(PATH([EmployeeKey],[ParentEmployeeKey]),189)

We get a set of results like this:

Parent Child Dimension with PATH and PATHCONTAINS DAX functions
Parent Child Dimension with PATH and PATHCONTAINS DAX functions

So the PATH for Employee Key 12 is “112 | 23 | 189 | 12″ – so key 12 is the leaf, and the three preceding members in the hierarchy. The PATHCONTAINS test for member 189 returns TRUE as it is present in the PATH.

So, now we have a means of evaluating in a single column what that rows place in the hierarchy is, and more importantly what its parents are.

The next challenge is to map this to the user data. Now, as I mentioned, there isn’t a meaningful relationship that can be established between the data and the security table, which I’ll show below:

PCSecurity

This means any relationship needs to be generated on the fly. Also note it’s not a single higher parent per user – a user may need to be able to view different paths in the hierarchy. An example case of this may be a manager with a responsibility for managing employees performance across multiple units but without a direct reporting line in the hierarchy.

GENERATE a relationship dynamically

The data items can be linked using the GENERATE function in conjunction with some FILTERs to determine what applies. GENERATE creates a cartesian product of two tables. So by using GENERATE between the Dimension and the Security table, we get a view of all possible user / dimension level combinations:

GENERATE(UserSecurity,DimEmployee)

Now this is only a starting point because we need to return a single value within our equation otherwise DAX will throw an error. Also, we don’t need everything from UserSecurity, just the items that apply to the current user. So first, lets FILTER our UserSecurity table:

GENERATE(FILTER(UserSecurity,UserSecurity[User]=”U4″),DimEmployee)

Note I’ve hardcoded the User here to “U4″ but in a real life context you would use the USERNAME function to get that name. Now we’ve cut down on one side of the cartesian product, but now we need to cut down the other side to get the members we are allowed to see. So here we filter using PATHCONTAINS to pick up the items that are the permitted members or children of those members:

GENERATE(FILTER(UserSecurity,UserSecurity[User]=”U4″),FILTER(‘DimEmployee’,PATHCONTAINS(PATH([EmployeeKey],[ParentEmployeeKey]),UserSecurity[Highest Parent]))

So now in the GENERATED table we have all the members of the Dimension that the current user is permitted to see.It’s worth taking a quick diversion from DAX to show how this works in practice:

PATHCONTAINS Example
PATHCONTAINS Example

Because the PATH contains the keys for all members in the hierarchy from the current member to the topmost parent, any PATH that contains the highest allowable parent will pass the PATHCONTAINS test. So if looking for member 189, rows 1&2 fail because though they are part of the overall path they are above the highest permissible member. Rows 3&4 pass because they feature the highest permissible member in the path to the topmost parent. Row 5 fails because the highest permissible parent doesn’t feature in the PATH at all.

Note that because the result at this point is still a Table as opposed to a single value, it handles users with multiple permissions correctly. The next part is to make sure that in the specific row of the Dimension we are evaluating if that row is in the permitted list, which we can do by putting an outer FILTER on the GENERATE.

FILTER(GENERATE(FILTER(UserSecurity,UserSecurity[User]=”U4″),FILTER(‘DimEmployee’,PATHCONTAINS(PATH([EmployeeKey],[ParentEmployeeKey]),UserSecurity[Highest Parent]))),DimEmployee[EmployeeKey]=EARLIER(DimEmployee[EmployeeKey]))

We use the EARLIER function to get the current value of the EmployeeKey from the outer evaluation pass and give us the context specific to this row. Our final step is to convert this into a single value, which I do using a COUNTROWS function:

COUNTROWS(FILTER(GENERATE(FILTER(UserSecurity,UserSecurity[User]=”U4″),FILTER(‘DimEmployee’,PATHCONTAINS(PATH([EmployeeKey],[ParentEmployeeKey]),UserSecurity[Highest Parent]))),DimEmployee[EmployeeKey]=EARLIER(DimEmployee[EmployeeKey])))

Because of the way the data works in this example, COUNTROWS will return either 1 if Authorised or 0 if unauthorised.

The final step is then to use this Authorised calculation in the Measures to ensure we can only see permitted values for that user, which is no more complex than:

AuthorisedSum:=CALCULATE(SUM(Data[Value]),‘DimEmployee'[Authorised]=1)

… and it’s done. Totals created by this measure will be sensitive to what the user is entitled to see and be fully aware of the Parent / Child relationship even though we only explicitly secure a parent.

Seeing this in action, we see that user U4 is authorised to see the two nodes from keys 290 (Alberts) & 294 (Abbas) and downwards:

Security Demo Pivot
Security Demo Pivot

As I mentioned previously this doesn’t secure the Hierarchy members from being visible, so the path up the tree is still visible – but there will be no values coming through that may be tied to those members.

Quick wrap

So this post has been pretty dense. If you want to get your hands on the result I have provided a sample PowerPivot workbook which has the completed example.

As a reminder, we set out to:

  1. Secure a Parent Child dimension at a Parent level, allowing all children to be visible
  2. Allow multiple Parents to be securable to a given user
  3. Keep it as easy to administer as possible

We did this by using the PATH function to make each row aware of where it was in the Parent / Child Hierarchy. Then using the GENERATE function and some FILTERs to dynamically mimic a relationship we worked out what security could apply. Finally we used the PATHCONTAINS function to work out what security applied.

So we hit point 1 on functionality, point 2 just through the nature of the solution – and by using a simple table rather than Roles / AD type solutions hit point 3.

Finally, a big hat tip to a post from Alberto Ferrari which helped me make sense of Parent Child in Tabular in the first place.

Read More

Passing Parameters to MDX Shared Datasets in SSRS

So part of my ongoing punishment work involved making some changes to a SSRS report that was running off some MDX, querying a Tabular  Model. It needed Parameters to feed Shared Datasets…  and the two had to talk nicely. This was not as easy at is seems…

First of all  big warning to anyone wanting to mix DAX and MDX on the same report: the two won’t talk well to each other. So in the below examples, whilst the DAX queries to get the parameters would have been relatively easy, they ultimately had to feed an MDX query (as that was what was pre-existing). This is a problem as Member Names are not in the same syntax for DAX and MDX so you can’t easily feed a selected Member from a DAX resultset into an MDX operator like STRTOMEMBER.

So, what we will do in this post is set up a simple report that has a user selectable parameter in the report that affects a shared dataset. Note the below examples run off AdventureWorks 2012.

Setting up a Parameter from MDX

Setting up an MDX Parameter to be usable is a little adventure in itself.  In order to be a Parameter that feeds into an MDX query, you need to pick up some additional Member properties along the way. So while this may get the members you want on the rows:

WITH
MEMBER    Measures.[Dummy] AS NULL

SELECT
{
[Measures].[Dummy]
}
ON COLUMNS,
{
[Date].[Calendar].[Calendar Year].Members
}
ON ROWS

FROM [Adventure Works]

It won’t play nicely when you try to use the member name in a function such as STRTOMEMBER it as it doesn’t return the Unique Member Name, which that function needs to work properly. So you need to write something like this:

WITH
MEMBER    Measures.ParameterCaption AS [Date].[Calendar].CURRENTMEMBER.MEMBER_CAPTION
MEMBER    Measures.ParameterUniqueName AS [Date].[Calendar].CURRENTMEMBER.UNIQUENAME
MEMBER    Measures.ParameterLevel AS [Date].[Calendar].CURRENTMEMBER.LEVEL.ORDINAL

SELECT
{
[Measures].[ParameterCaption]
,    [Measures].[ParameterUniqueName]
,    [Measures].[ParameterLevel]
}
ON COLUMNS,
{
[Date].[Calendar].[Calendar Year].Members
}
ON ROWS 

FROM [Adventure Works]

What this adds by returning these member properties is some values that can be used more effectively in SSRS parameters. The MEMBER_CAPTION gives the Member name for display,  UNIQUENAME gives the unique member name in the hierarchy so it can be used in downstream MDX calculations reliably and finally the level ORDINAL number can be used if depth in the hierarchy is important.

So now, to set it up in SSRS. I won’t go into too much detail – you can get most of this from the sample, but essentially create a shared dataset with the MDX for the parameter, add a report, then within the report create a dataset that uses the shared dataset. Then, we add a parameter and set it up as below (click on the images to see the full screen – the screen grabs were too big).

First, give it a name, a user friendly label and allow multiple values (if you want)

SSRS MDX  parameter setup properties
SSRS MDX parameter setup properties

Then, we point the parameter to get values from the shared dataset we created for the parameter. Here we use the additional properties we added in the MDX query.  We use the member UNIQUENAME as the Value field and the CAPTION as the Label field. The Value field is the actual data that will be passed by the parameter to other datasets.

SSRS MDX  parameter setup available values
SSRS MDX parameter setup available values

Finally, to keep things clean, we use the same dataset to provide the default values for the parameter.

SSRS MDX  parameter setup default values
SSRS MDX parameter setup default values

.. and we’re done. Now to connect it up with the Shared Dataset properly.

Parameterise a Shared Dataset

First, we need to create a parameterised shared data set. So we will add a Shared Data Set with a parameter in the text of the query, like below:

SELECT NON EMPTY
{ [Measures].[Order Count] }
ON COLUMNS,
NON EMPTY
{ ([Ship Date].[Calendar Year].[Calendar Year].ALLMEMBERS ) }
ON ROWS
FROM [Adventure Works]

WHERE STRTOSET(@CalendarYears)

Note the parameter for the query is called out by preceding it with the “@” symbol.

Now it is important to add the query in the right place to get the parameterisation to work. Don’t use the function button and add the query directly. Go to the “Query Designer” option

Define a Shared Data Set Query
Define a Shared Data Set Query

 

Within the Query Designer, click the “Design Mode” option (the triangle on the far right):

Shared Data Set Query Designer
Shared Data Set Query Designer

And then paste the query in there:

ssrs_mdx_parameter_setup_3c

Then you need to click on the  “Query Parameters” option –  5th from the right (the toolbar icons are minuscule at 1920×1080), and set up the parameter – this time, specify the name without the @ symbol:

Shared Dataset Parameter setup
Shared Dataset Parameter setup

This involves naming the parameter and tying it to a Dimension, Hierarchy and optionally default value(s). After that  click ok, run the query to test, and if all is OK, click on OK to close the designer.

Adding the parameter to the query in this way automatically adds content to the Parameters section of the Shared Dataset – note  the parameter name here is not preceded by an “@” symbol:

Shared Dataset Parameter section
Shared Dataset Parameter section

And you’re done.

Side warning about trying to set up parameters directly in the Shared Dataset properties:

One thing to be careful of is that doing parameter setup directly in the Shared Dataset properties, instead of via the Query Designer Parameter set up is that doing so adds a default value to the parameter which may throw this error when you run the report.:

An error occurred during local report processing.
The definition of the report ‘[Report Name]’ is invalid.

An unexpected error occurred while compiling expressions. Native compiler return value: ‘[BC40000] ‘RequestMinimum’ is obsolete: ‘Assembly level declarative security is obsolete and is no longer enforced by the CLR by default. See http://go.microsoft.com/fwlink/?LinklD=155570 for more information.’.’.

I haven’t tested any other context but this was using SSDT as part of VS2012. The reason is the default parameter format looks like this:

=new Object() {[Date].[Calendar Year].&[2008], [Date].[Calendar Year].&[2009], [Date].[Calendar Year].&[2007]}

Which causes errors at runtime. It seems to be impossible to correct as well – you can edit it but it always reverts back to the created value. So make sure you follow the steps above – as a reminder:

  1. Create a shared dataset
  2. Create the query in Query Designer mode
  3. Set up the parameter via the Query Designer “Query Parameters” window

Parameterise the Report

Next we need to set all this up in a report. Add a new dataset to the report that refers to the shared dataset we just created, and put a Tablix on the report that uses the Shared Dataset as a source.

Because the source Shared Dataset is parameterised, the report dataset will carry that parameter across:

Report Dataset Parameters
Report Dataset Parameters

It will map the Parameter Name from the query (without an “@”) to the  Parameter Value available from the report (with the “@”). Now – because planning – I named the parameter created in the first section the same as that for the Shared Data set, it has auto-magically mapped correctly.

Now, if you execute the report and set some values for the parameter, the parameter values will successfully pass via the report to the Shared Data Set and affect the results it passes back to the report.

There you go. Not the most obvious thing in the world, and there’s lots of ways to get it wrong – especially by putting @ symbols in the wrong place – but if you follow the pattern laid out in this post, you should avoid too much trouble.

The sample project can be downloaded here: Shared Dataset Parameters Example Project

 

 

Read More

Adventures in DAX, a mild grumble

Being back in the field is proving to be interesting as I get back into the daily frustrations and challenges of development (as opposed to those of PowerPoint). Anyway, one of the new things I’m hands on with is and Tabular Models and DAX table queries, and I’m feeling a little underwhelmed.

Syntax

I’m now torn with whether to hate DAX more than MDX. It seems that in the design of DAX table queries, any opportunity to make it accessible was largely ignored. What do I mean? Well…  here’s how to get all the rows from a given table in a Tabular Model:

EVALUATE
(
‘A Fact Table’
)

Ugh…   “EVALUATE”? I’m sure it makes sense to someone, but even MDX managed to have “SELECT”.
But then hiding in there are some weird syntactic quirks at the simplest level. Say I want to SUM a given column, I do this:

EVALUATE
(
SUMMARIZE
(
‘A Fact Table’,
‘A Fact Table’[My Measure]
)
)

So setting aside the Americanisation of SUMMARIZE, the above only actually works if the Measure is in the base table. If its a calculated column, you have to give it an alias to be able to access it, i.e:

EVALUATE
(
SUMMARIZE
(
‘A Fact Table’,
“My Alias”, ‘A Fact Table’[My Measure]
)
)

So that’s just annoying, as the error message if you don’t use an alias simply implies the measure isn’t there.

Usability

That’s of course just the small stuff. There’s the bigger challenges.

SSMS has no DAX query editor, so you need to use an MDX query template. And then in that there’s no Intellisense and the Metadata panel is useless as it is for querying your tabular model in MDX mode.

SSRS can only cope with DAX queries if you pretend you are writing a DMX query.

I’ve yet to interact with it via SSIS but I doubt the experience is much better.

The Upside

Anyway, it’s not all bad. To get started, take a look at these posts by Chris Webb (don’t fall over yourself looking at the official documentation though, it’s a bit sparse).

The language looks pretty powerful and getting sets of data out of it is simpler than MDX. However it’s not as friendly as it could be and it’s lacking some basic support as part of day to day usability. Hopefully this will improve by SQL2016…

Read More