Sometimes dimension data is managed across multiple tables. Just to complicate things sometimes this data has independent effective date ranges on these sources. So when we try to tie our data together, trying to pick which item of data is effective when is a bit of a challenge.
A picture speaks a thousand lines of blog post, so the picture below spells it out:
Table A has a set of data with different effective periods. Table B has a set of data for the same attribute with a completely independent set of effective periods. In data terms, it looks like this:
The challenge is to join them together so we get the right combination of attributes effective at the right time, as per the first picture. Now there is a way to do it through a join with careful selection of start / end dates in a CASE statement and filtering out of records using WHERE clause. However that has the downfall that it cannot cope with records where there is no cross over of data – so records “1-“,”4-” and “5-” have to be added in later through a separate process.
The alternative is to get the window functions voodoo doll out, and stretch the brain a little so you can do it all in one query.
Step one in this exercise is realising that each tables start dates could also be end dates in the other table, and each tables end dates could also be start dates (less a day) in the other table. So we need to UNION End Dates from Table A with Start Dates from Table B, like so:
SELECT ID, [A Start Date] AS [Start Date]
SELECT ID, [B Start Date]
— All end dates are start dates + 1 too
SELECT ID, ISNULL(DATEADD(d,1,[A End Date]),’31 Dec 2099′)
SELECT ID, ISNULL(DATEADD(d,1,[B End Date]),’31 Dec 2099′)
Now, this gives us a full set of every possible start date – which is a starting point.The end result looks like this:
We can repeat the same trick for end dates and then do a cartesian join on the two sets and then we get a combination of every possible start and end date. No we need some criteria by which to select the right date pair. If we add a DATEDIFF to the resultset it becomes obvious we want to pick the smallest date range:
A WINDOW function gives us the intelligence to pick the right row. So if we apply a ROW_NUMBER() over a PARTITION of Start Date, ordering by End Date, then we just have to select the first row of each partition:
Now we have a complete set of effective date ranges on which to join our attribute tables!
Grab a copy of the query, including sample data scripts here: DateRanges
Don’t forget to check out my upcoming training “Build your Data Warehouse in SQL Server & SSIS with the BI Monkey” – a primer course for anyone wanting to build a first DW using Microsoft tools.