SSAS Tabular Date Compression revisited

A while ago I wrote about some tricks to reduce the size of a tabular model, in which I experienced some strange results for SSAS Tabular Date Compression. I am lucky enough to be sat in Marco Russo’s Mastering DAX course (a benefit of helping organise the course) and it all became clear.

How are datetimes stored in Vertipaq?

Under the hood, datetimes are stored in tabular as floats – with the date portion as an integer and the time portion as the floating point component (i.e. the bits after the decimal point).

This explains my results:

  • Dates compress well as they are treated as integers
  • Datetimes compress terribly as they are treated as floats
  • The less time precision you have (e.g. s vs. ms) the better the compression is as the float has lower precision
  • Baselining a datetime to a specific date to isolate the time doesn’t work as you are still compressing a float

So the recommendations stand for handling date columns, i.e.:

  • Load the date portion and time portions separately
  • Load the date portion as a date
  • Load the time portion as an integer of the unit of measurement since midnight
  • Reduce the precision of the time portion to the lowest acceptable to the business

Read More

Why Disruption and Innovation are coming and you aren’t smart enough to take advantage

The fundamental point of this ramble is this – most of you worrying about Disruption and Innovation have literally no idea what is round the corner and all your efforts are a probably an epic waste of time. Yes, this is a “the machines are coming!” post.

To give a little human context, on Friday Night after a couple of decent glasses of Red I got into a discussion with the BI Monkey’s other half about how her employer was chasing innovative customer experiences, vision and so forth. This set me off on a bit of a tangent on the scale of Corporate Vision as earlier in the day I have been reading about Elon Musk’s vision to get humanity to Mars, which eats most corporate visions for breakfast in terms of sheer breadth of ambition and scope. On the way got side tracked into AI (due to reading this earlier in the day). These coalesced over the weekend into this slightly rambling post.

Corporate vision beyond
Corporate vision beyond “great customer experience”

One point which I will totally fail to address but want to think more about is – if we create a true intelligence – why is is Artificial?

Disruption and Innovation you are (possibly) smart enough for

There are three rough classes of AI. The first of which is Artificial Narrow Intelligence (ANI). Which means smart, but within very narrow boundaries. For example, my car has adaptive cruise control and though some tech voodoo is smart enough to be able to match its speed with the vehicle in front, apply emergency braking if needed and warn me if I stray out of lane. This is all very spooky on the first use but very quickly becomes something to switch on by default as I realise the car pays more attention to traffic than I do. Note I am not one of the majority of drivers who think they are better than average – I am definitely not that great and would rather a robot was in charge. Driving is hard and stressful, especially on busy roads (long country drives are a different story).

Anyway, the point is that the car is a way better driver than me, it’s smart, attentive, doesn’t get annoyed by the jerk in front cutting in and has absolutely no understanding of why it is going where it is going, what a sandwich is or how to play chess. It has a Narrow focus and is unlikely to move out from those guide rails.

Don't cut in front of these robot cars, you jerk
Don’t cut in front of these robot cars, you jerk

This form of AI can be disruptive as we understand it. Truly autonomous vehicles are going to disrupt our economy like you wouldn’t believe and we are less than 10 years away from it now (Volvo have just stepped up and said they will accept liability if their self driving cars crash). Innovations like Uber’s approach will leverage these disruptions.

What I will gloss over here is that this class of Disruption is going to hammer the economy by removing about 40% of jobs, and if you aren’t in a role leading the disruption in those 40% of jobs, you’d better retrain. A lot of the Disruption/Innovation axis cheerleaders like to skip over the fact that the massive improvements for customers will unfortunately remove the ability for a lot of those customers to actually earn anything, but hey ho. We can always hope for a Universal Basic Income, as per Steven Hawkings observation that one of the big future problems we face is not the lack of jobs but how the balance of power in society alters once most of us are no longer capable of adding any value to the economy.

Disruption you are probably too dumb for

ANI however is a limited threat as it still needs a lot of help from us humans to be designed, given a purpose and boundaries, yadda yadda. At some point – currently estimated at around 2040 (give or take 10 years) – we will create our first Artificial General Intelligence (AGI). Which is AI that can learn, set its own boundaries, solve general problems  and no longer need handholding by us. At this point you can pretty much forget Innovation. The machines will largely be better at solving problems than us. Our best bet is that we can at least ask the right questions and still get some answers that a standalone intelligence cannot pose or use by itself.

The Disruption that is caused by the ability for us as a species to create an on-demand intellect to address problems will be that – as an economic resource – our value will plummet. A machine intelligence will be able to answer thousands of questions at once, have perfect recall and access to more data than we can hope to ingest in our lifetime and – best of all from an economic perspective – will have marginal input costs (it won’t need pay for a start) and no issues with motivation.

We will make the robots do all the Excel and they will never read Buzzfeed
We will make the robots do all the Excel and they will never read Buzzfeed

There will be a brief burst of innovations as people pose interesting problems to these new intelligences and apply the solutions, but try having competitive advantage when you can ask your AI how their AI solved a problem and you get an answer in seconds. R&D lag is gone, specialist expertise and accumulated experience is rendered worthless.

Then of course these AGI’s will take the next step.

The Disruption which will make your Innovation irrelevant

The next step in the scale is Artificial Superior Intelligence (ASI). This is where an AGI is smart enough to make  its self smarter and exceeds our intelligence altogether at a point known as The Singularity. At this point all bets are off. The ASI will be able to solve any problem we throw at it, create and solve new ones we weren’t even aware of and either drive us to extinction or gently herd us into the future in ways that will boggle our tiny little minds. The economy as we understand it will cease to be relevant so at least you won’t have to worry about your job.

So, we have to hope that we get Multivac or The Culture. The alternative is less about the ASI being hostile, and more about it being as indifferent to us as we are to a microbe. We may not consciously set out to destroy a microbe, but we may do so through our actions because we do not consider the microbe as we act.

Gratuitous Cthulu Picture because you may want to get eaten first
Gratuitous Cthulu Picture because you may want to get eaten first

Are we all going to die, BI Monkey? What should we do?

The safest course of action is to transfer all your money to me immediately and …  I haven’t thought about step 2. Perhaps take more of an interest in this subject so we can be better prepared for the coming changes. We still have 20 years to Disrupt and Innovate like it matters in the meantime.

Read More

Using SUMMARIZE to get a Maximum Per Day in a Period

A quick snippet on the following scenario. I had daily data of users sessions in a system and wanted to know in a given period what the maximum number of unique users on a single day was. My data looked like this:

Data sample
Data sample

A user could connect multiple times per day, so I needed a DISTINCTCOUNT to get unique users. However for a given period I needed to know this per day. So for the period I needed to calculate the number of unique users per day in that period – which meant I needed to create an interim table using SUMMARIZE.

This resulted in this construction:

SUMMARIZE(Logs,[Session Date],”UsersPerDay”,DISTINCTCOUNT([User]))

“Logs” is my source data table. “Session Date” is what I am grouping my table by to get the results per day. The context of the period I am looking at (be it year, month, quarter, whatever) is managed by the date filters I apply to the table. “UsersPerDay” is just the name I assign to my measure, which is the DISTINCTCOUNT of the User field.

What I will end up with is an interim table which has – per day – the number of distinct users. Though it will not be materialised, in memory it would look like this:


Then, to get the Maximum in a day for a period, we just need the MAX of the the UsersPerDay in this table. As it’s an expression, we lean on MAXX:

MaxUsersPerDay:=MAXX(SUMMARIZE(Logs,[Session Date],”UsersPerDay”,DISTINCTCOUNT([User])),[UsersPerDay])

And there we have it! Note in MAXX the Expression we use to get the MAXX of is our custom “UsersPerDay” we created in the SUMMARIZE function. Intellisense won’t pick this up as it’s not part of the model but the formula works just fine.

Read More

Option Syntax for OData.Feed in PowerQuery

Technical post – for those who need to add options / parameters to their OData feeds in PowerQuery, syntax help is a little thin in the official documentation.

So for those moving beyond the basic connection of


If you want to add headers, then do this:


Noting that the escape for a double quote is another double quote in your headers string.

If you want to use a boolean option, such as EnableBatch, then do this:


Note the use of the plain null with no surrounding quotes if there’s no 2nd parameter value being provided.

Similarly for numeric parameters:


However, Timeout needs to be expressed as a Duration type – so you need to do this manipulation:


I have done more fun things that try to work out this syntax – I hope this saves you some time!

Read More

Datazen first client test!

I got to use Datazen for the first time in anger with a client this week, and my experience was a bit of a mixed bag. There are elements of it which are neat but a fair few niggles.

Things to love

It’s pretty. The simple visualisations mean it is easy to create a nice looking, uncluttered dashboard with minimal fuss and tweaking. The responsive design means cross device functionality is pretty good and looks nice. It’s also quick to build and change content.

Quick learning tips

First up, let’s get some practical learnings shared:

  1. Use views or stored procs behind your data if hitting a SQL Source. Datazen has no query editor (just a big text box) and doesn’t always handle parameters gracefully. Plus debugging is a pain as error massages are often less than helpful (e.g. “Data Preview Failed”)
  2. Set report friendly field names in you query as you can’t always manage them in designer – sometimes you can, sometimes you can’t.
  3. Selecting the ‘All’ option on a Selection List Navigator sends back ” (empty string) as the parameter value to the query, so handle that rather than NULL.

Now, some major drawbacks:

  1. Consistency of cross platform behaviour is not great. I found some drillthoughs didn’t work on iOS or Android. Windows 8 seems to be the go to client. It’s not fatal but for fussy clients it’s a hard sell that this cross platform tool doesn’t work as expected.
  2. The Win 7 Publisher app is unstable, buggy and seems to have features missing – such as proper configuration for report drillthrough. It’s only been around a few weeks so it’s forgivable but if you have to use it seriously, make sure you have a Win 8 client somewhere to do your development work on.
  3. The charting is actually quite limited. There’s no stacked bar, for example. Line charts can only be by time. Labeling and colours are pretty hard to control, often rendering components useless. A great example is the category chart (bar chart) – the renderer will sometimes decide not to display category labels – which then means you just have a nice picture of some bars with no context as to what each one is, like this:
Could be cat population by state for all I know
Could be cat population by state for all I know

Finally some irritations:

These are some of the things that got annoying as I used the product – not an exhaustive list – and small enough I’d expect them to be fixed relatively soon.

  1. You cannot update columns on a grid component if the underlying column names change – you have to rebuild component (a small task but annoying during development)
  2. You cannot set the Low/Neutral/High ranges for gauge columns on indicator grids so they match settings for other gauges
  3. You cannot align numbers – and they are aligned left which is not good dataviz practice
  4. There is no handling for outliers on heatmaps so one extreme value will screw your shading
  5. You can’t cascade drillthrough to a level below
  6. The data preview after creating a data view has no Scroll bar so if there’s a lot of fields you can’t see them
  7. There are maps provided but you have to work out how they are keyed yourself so you can connect your data (to be addressed in a future post)
  8. You can’t “oversize” the canvas so phone users can scroll down.
  9. Nobody’s using it – or at least talking about it – so help is hard to find.

A lot of irritation boils down to “I want to set that but I can’t”. This I’m sure is partly design, partly product maturity.

My takeaway.

After a week with the product I get a real sense that it’s not even a v1 product yet. Maybe v0.9. There’s lots of niggles in the product – and not just at the back end where users can’t see them. I could tolerate the back end weaknesses if the end user experience was great, but there’s holes there. Still, there’s a lot of good that can be done. It’ll be interesting to see how it fares given PowerBI has such a huge feature overlap.

Read More

New DQS features in SQL 2016 CTP2

In this quick post I’m going to share the new features introduced in Data Quality Services as part of SQL Server 2016 CTP2

Well, that didn’t take long. As far as I can tell, nothing has changed. So still hanging out for:

  • A decent UI
  • Ability to reference data services other than Azure Marketplace (e.g. MDS!)

Noises from MSFT are not exactly encouraging, so I’m not expecting much in future CTP’s at this point.

Read More

Migrating Package Connections to Project Connections in SSIS

One of the best changes in SSIS 2012 was to create the concept of a Project Connection – a connection manager that can be used across the whole project instead of being limited to the package scope, meaning you had to recreate and configure effectively the same connection in every single package you have. This feature is great…   when  you are starting a new project.

However a recent task I got handed was to migrate a 2008 project to the 2012 project model. All very sensible, to ease maintenance, eliminate  XML configurations and generally bolster security. Then I got to work….

Converting a Package Connection to a Project Connection

Ah, the easy part. Pick your connection, right click, convert to  project connection and  … ta daa! You have a project connection!

SSIS Project Connection
SSIS Project Connection

Now… what about all the other packages?

Pointing every other Package connection to the Project connection

This is a little harder. The good bit is your project connection will appear in your available connection managers. The bad bit is there is no way to tell the SSIS designer to use this one instead of your old one. You can either manually repoint every data flow, Execute SQL, script and whatever other task happens to be using the package connection to the project connection – easy if your package is small and simple – or get X(ML)treme! Fortunately thanks to this post by Jeff Garretson I was reminded that SSIS packages are just XML, and XML can be edited much faster than a package using the designer. Jeff’s post only resolved how to fix up the Data Flow – I had a pile of Control Flow tasks to fix up too – so here’s how to get it done without hours of coding.

Step 1: In designer mode Get the Name & GUID of all Connections to be replaced and what to replace them with.

You can get this from the properties window when you have a connection manager selected in the SSIS designer:

Connection Manager Properties
Connection Manager Properties

Step 2: Switch to code view and replace all Data Flow connections

You can find where a package connection is being used in a data flow by looking for the following in the XML:

connectionManagerID=”Package.ConnectionManagers[{Package Connection Name}]”

connectionManagerRefId=”Package.ConnectionManagers[{Package Connection Name}]”

Replace this text with:

connectionManagerID=”{Project Connection GUID}:external”

connectionManagerRefId=” Project.ConnectionManagers [{Project Connection Name)]”

Do this for all connections, then return to design view and delete the package connections

Step 3: Refresh the connections

This shouldn’t strictly be necessary but it seems to refresh all the meta data

  1. Close the package
  2. Open the package in code view

Step 4: Replace all Control Flow connections

  1. Find and replace {Package Connection GUID} with {Project Connection GUID}

Step 5: Refresh the connections again

  1. Close the package, open in design view
  2. Check for errors
  3. Done!

This almost certainly could be coded up by some one with the right chops, time and motivation.

A useful side note – if you have a Project Connection and a Package Connection with the same name (get link) then your project will start throwing errors.

Read More

PowerPivot Succinctly – new eBook by me!

Want to learn more about PowerPivot? Syncfusion have published a new eBook “PowerPivot Succinctly” written by my good self.

A quick overview of the contents:

  1. PowerPivot Model Basics
  2. Using your PowerPivot Model
  3. Sharing your PowerPivot Model
  4. A Note on Instability
  5. Deep Dive: The xVelocity Engine

It’s a free download – so go help yourself!

Read More

The fine art of starting to adopt Agile with a Zero sprint

Agile methodologies have a patchy track record in BI/DW projects. A lot of this is to do with adopting the methodologies themselves – as I’ve alluded to in prior posts there are a heap of obstacles in the way that are cultural, process and ability based.

I was discussing agile adoption with a client who readily admitted that their last attempt had failed completely. The conversation turned to the concept of the Zero sprint and he admitted part of the reasons for failure was that they had allowed Zero time for their Zero sprint.

What is this Zero sprint anyway?

The reality of any technical project is that there are always certain fundamental decisions and planning processes that need to be gone through before any meaningful work can be done. Data Warehouses are particularly vulnerable to this – you need servers, an agreed design approach, a set of ETL standards – before any valuable work can be done – or at least without incurring so much technical debt that your project gets sunk after the first iteration cleaning up after itself.

So the Zero Sprint is all that groundwork that needs to be done before you get started. It feels counter agile as you can easily spend a couple of months producing nothing of any direct value to the business/customer. The business will of course wonder where the productivity nirvana is – and particularly galling is you need your brightest and best on it to make sure you get a solid foundation put in place so it’s not a particularly cheap phase either.

How to structure and sell the Zero sprint

The structure part is actually pretty easy. There’s a set of things you need to establish which will form a fairly stable product backlog. Working out how long they will take isn’t that hard either as experienced team members will be able to tell you how long it takes to do pieces like the conceptual architecture. It just needs to be run like a long sprint.

Selling it as part of an Agile project is a bit harder. Because you end up not delivering any business consumable value you need to be very clear about what you will deliver, when you will deliver it and what value it adds to the project. It starts smelling a lot like Waterfall at this point, so if the business is skeptical that anything has changed, you have to manage their expectations well. Be clear that once the initial hump is passed, the value will flow – but if you don’t do it the value will flow earlier to their expectations, but then quickly after the pipes will clog with technical debt (though you may want to use a different terminology!).

Read More

Managing complex security in SSAS Tabular – Yeah Nah edition

A couple of times recently I have come up against requirements which have required some fairly complex logic to apply security. One involved some fairly gnarly relationships coming from multiple directions, the other involved grinding through Hierarchies from parent nodes down to permitted viewable children.

The problem with both cases is that though the logic can sometimes be written (albeit usually in an ugly as hell manner) – the functions needed to do so perform atrociously. For complex relationships you are obligated to take in context after context, changing filters and doing all sorts of DAX voodoo. As we know by now, avoiding relationships is good for performance. Hierarchies can be managed through the PATH function, but it’s a text operation that is far from speedy.

Let’s give a quick example of some complex security – consider the below data model:

Complex Model
Complex Model

Here the security controls who can see what has been spent on a Task in the FactTable object. How can see what depends on their Role and/or the Unit they are in. There is also a 1:many relationship between a person and the login they can use.

So for dynamic security you need to navigate from the User Id to the Person and assess what Unit they are in for the Unit based permissions. You also need to assess what Role they are in to get the Role based permissions.

I took one look at this and shuddered at the messy DAX I was going to have to write, plus how terribly it would perform.

Do it in the Cube? Yeah Nah.

So I thought “Yeah nah” and decided the cube was the wrong place to be doing this. Ultimately all I was trying to get towards was to pair a given login with a set of tasks that login would have permissions against. This is something that could easily be pushed back into the ETL layer. The logic to work it out would still be complex, but at the point of data consumption – the bit that really matters – there would be only minimal thinking by the cube engine.

So my solution enforces security through a role scanning a two column table which contains all valid pairings of login and permitted tasks to view. Very fast to execute when browsing data and a lot easier to code for. The hard work is done in loading that table, but the cube application of security is fast and easy to follow. The hierarchy equivalent is a pairing of User Id with all the nodes in the Hierarchy that are permitted to be seen.

As a final note, for those non-Aussie readers the expression “Yeah nah” is a colloquialism that implies that the speaker can’t be bothered with the option in front of them. For example: “Do you want a pie from the Servo, Dave?” “Yeah nah.”

Read More