What does the Merge Join Transformation do?
The Merge Join Transformation joins two data sets on a common key (or keys). It’s the same as performing an SQL Join – the options you have are Inner Join, Full Outer Join and Left Outer Join. However the difference is that SSIS rather than SQL is doing the join, so it’s an in-memory activity. The Merge Join is a Partially Blocking transformation – it needs to receive all data per join key value pair from all inputs before it can proceed, so this can have a significant negative effect on performance.
Configuring the Merge Join Transformation
A key thing to note for the Merge Join Transformation is that the inputs must be sorted on the keys on which you are joining the datasets, for both sets. This means you either have to set the IsSorted property on your source if the data is coming in sorted, or push it through a Sort transformation before you can attempt to do the join. In the example package, I have used 2 OLE DB sources. One of these is unsorted and I sort the output using a Sort column – note this is another blocking transform and another chance to drag performance down. The other one is sorted, and I tell SSIS by setting the IsSorted property of the OLE DB Source Output to true using the advanced Editor.
In terms of configuration of the Merge Join, all you need to do is map the join keys using a drag and drop and select the columns you want on the output, as shown below:
The checkbox to the left of the column name in the join section indicates whether that column is output from the transformation. In the box below the join section, you can rename the output columns if you want. The Order column is the Sort Order, and is not configurable. The Join Key checkbox is ticked when you create a relationship.
There is no advanced editor, but if you access the properties from the design surface there are a few extra properties to access which are worth knowing about:
- MaxBuffersPerInput – which sets how many buffers to be active for one input at a time – see here for more detail
- TreatNullsAsEqual – which decides whether to join Null values or not – by default this is true.
Where should you use the Merge Join Transformation?
The answer to this is simply that it should be used wherever it is not possible to perform the join using native SQL functionality – e.g. joining data from text file sources or other non database sources. If your sources are coming from the same database, use a query and get the database to do the work. It’ll be faster and more efficient in the vast majority of cases, and the fact that it’s a blocking transformation will slow down your package overall.
If you are still struggling, try these additional resources:
- Merging multiple data sources with the MERGE JOIN task in SQL Server Integration Services SSIS by Greg Robidoux at MSSQLTips
- Using a Merge Join in SSIS – Youtube video by rcaminiti
- Joining datasets in SSIS – Using Merge Join by SSISTutorial.com
Also on advanced use of the component:
- Performing a Cross Join (Cartesian Product) in SSIS by Todd McDermid
If you need specific help or advice, or have suggestions on the post, please leave a comment and I will do my best to help you.