In SQL 2008 a new T-SQL construct was added – the MERGE operation. (Ok, pedants will know this wasn’t new to Oracle, but it was new to SQL Server).
This operation allows for the merging of a dataset into a reference dataset – which can be remarkably similar to Insert / Update operations effected by the Slowly Changing Dimension transformation. However the way it operates is very different. Instead of the SCD’s row by row evaluation approach, the MERGE operation is a set based operation. What this means is it compares the whole of the source dataset to the reference dataset in a single pass. This has significant implications for performance – on a site where I implemented this the operation which took 1,200 seconds in the SCD cut down to 51 seconds using a Merge.
There are limitations and differences to be aware of:
- You cannot directly return row counts for Insert / Update / Ignore operations in the Merge
- As it is a bulk operation a single row will cause failure of the whole batch
- There’s no GUI – just hand crafted SQL
- Less error trapping / logging options
- More flexibility in terms of actions when matches / non matches are found
The main reason why you would consider the SQL Merge – it handles Type 1, and with a little cunning, Type 2 dimensions – in a fraction of the time it takes the SCD to plod through. It’s still not as fast as a proper in memory comparison using something such as TableDifference – but it’s always good to know you have something else available in your toolkit.
- Using the SQL MERGE Statement for Slowly Changing Dimension Processing – from the Kimball Group
- How to create type 1 & 2 SCD’s using standard SSIS components (other than the SCD) (at the bottom of the post) – Benny Austin