What does the Merge Transformation do?
The Merge Transformation merges two sorted data sets and outputs a sorted data set – and by Merge it really means a “Sorted Union All”. You provide it with the two sorted sets, specify the keys on which they are sorted, and the “merge” then takes place, ordering the output on those keys. In my example package, I merge two datasets – each with 290 rows – and get a 580 row dataset as output.
You can see above you have to specify the sort keys (you can have multiple keys) and then the mappings for the columns.The sort keys must have the property IsSorted set to True – this will be done automatically by the Sort component, but will need to be set manually in the source if the data is sorted outside of SSIS – this is available in the advanced properties of the output columns of a component.
For simplicity in the example I have simply merged the same dataset, though in one copy I have left some columns off to show that you don’t have to match the columns, just the sort keys. All that happens in unmapped columns is that a null is entered for that columns value in the output dataset.
When would you use the Merge Transformation?
Technically, it should be used to merge two sorted datasets to get a sorted merged output data set. In practice, I fail to see much advantage in using this component -The only scenario I can see this being genuinely useful is when you have to Union two very large sorted datasets, need the output to keep that sort, and cannot afford the memory overhead of a blocking Sort component after the Union which will have to recieve all the data before it can do that sort.
The Union All component can accept more than 2 inputs, doesn’t require inputs to be sorted, and if you really need the output sorting, then you can sort the output using a Sort component. I demonstrate this in the sample package in data Flow 2, which returns the same results as the Merge Data Flow but with one less component.