datetime

SSAS Tabular Date Compression revisited

A while ago I wrote about some tricks to reduce the size of a tabular model, in which I experienced some strange results for SSAS Tabular Date Compression. I am lucky enough to be sat in Marco Russo’s Mastering DAX course (a benefit of helping organise the course) and it all became clear.

How are datetimes stored in Vertipaq?

Under the hood, datetimes are stored in tabular as floats – with the date portion as an integer and the time portion as the floating point component (i.e. the bits after the decimal point).

This explains my results:

  • Dates compress well as they are treated as integers
  • Datetimes compress terribly as they are treated as floats
  • The less time precision you have (e.g. s vs. ms) the better the compression is as the float has lower precision
  • Baselining a datetime to a specific date to isolate the time doesn’t work as you are still compressing a float

So the recommendations stand for handling date columns, i.e.:

  • Load the date portion and time portions separately
  • Load the date portion as a date
  • Load the time portion as an integer of the unit of measurement since midnight
  • Reduce the precision of the time portion to the lowest acceptable to the business

Read More

Creating effective date ranges from multiple sources using Window Functions

Sometimes dimension data is managed across multiple tables. Just to complicate things sometimes this data has independent effective date ranges on these sources. So when we try to tie our data together, trying to pick which item of data is effective when is a bit of a challenge.

A picture speaks a thousand lines of blog post, so the picture below spells it out:

Date Ranges from multiple sources
Date Ranges from multiple sources

Table A has a set of data with different effective periods. Table B has a set of data for the same attribute with a completely independent set of effective periods. In data terms, it looks like this:

Date ranges from multiple sources sample data
Date ranges from multiple sources sample data

The challenge is to join them together so we get the right combination of attributes effective at the right time, as per the first picture. Now there is a way to do it through a join with careful selection of start / end  dates in a CASE statement and filtering out of records using  WHERE clause. However that has the downfall that it cannot cope with records where there is no cross over of data – so records  “1-“,”4-” and  “5-” have to be added in later through a separate process.

The alternative is to get the window functions voodoo doll out, and stretch the brain a little so you can do it all in one query.

Step one in this exercise is realising that each tables start dates could also be end dates in the other table, and each tables end dates could also be start dates (less a day) in the other table. So we need to UNION End Dates from Table A with Start Dates from Table B, like so:

SELECT    ID,    [A Start Date] AS [Start Date]
FROM    Table_A
UNION
SELECT    ID,    [B Start Date]
FROM    Table_B
UNION
— All end dates are start dates + 1 too
SELECT    ID,    ISNULL(DATEADD(d,1,[A End Date]),’31 Dec 2099′)
FROM    Table_A
UNION
SELECT    ID,    ISNULL(DATEADD(d,1,[B End Date]),’31 Dec 2099′)
FROM    Table_B

Now, this gives us a full set of every possible start date – which is a starting point.The end result looks like this:

Union Results
Union Results

We can repeat the same trick for end dates and then do a cartesian join on the two sets and then we get a combination of every possible start and end date. No we need some criteria by which to select the right date pair. If we add a DATEDIFF to the resultset it becomes obvious we want to pick the smallest date range:

Crossjoin results with DATEDIFF
Crossjoin results with DATEDIFF

A WINDOW function gives us the intelligence to pick the right row. So if we apply a ROW_NUMBER() over a PARTITION of Start Date, ordering by End Date, then we just have to select the first row of each partition:

The final result
The final result

Now we have a complete set of effective date ranges on which to join our attribute tables!

Grab a copy of the query, including sample data scripts here: DateRanges

Don’t forget to check out my upcoming training “Build your Data Warehouse in SQL Server & SSIS with the BI Monkey” – a  primer course for anyone wanting to build a first DW using Microsoft tools.

 

Read More