Some of you may have heard of the SSIS Balanced Data Distributor data flow component. It’s a neat gizmo that acts (sort of) like a mix of a Multicast and a Conditional Split, sending different buffers to different destinations. The idea of the component is to increase parallelism in data flows and thus speed up processing. I suggest you read this post by Matt Masson on the SSIS Team Blog to get an idea of how it works.
I thought I’d give it a road test in conjunction with the DQS Cleansing component to see if it offered any performance benefits. This was using DQS CU1 to get the delivered performance benefits, and also changing the “DQS CHUNK SIZE” parameter to 10,000 (see here) so it didn’t run like a total dog from the outset.
I pumped 150,000 rows through 5 simple domains and here’s the results:
Round 1: the DQS Baseline
Pushing all the rows through a single DQS Cleansing component took a whopping 21 minutes. Ouch.
Round 2: DQS with the BDD (Balanced Data Distributor)
Splitting the flow using the BDD to four identical targets (to match the number of cores I have available) took 10 minutes. Still not great, but it is a 50% performance improvement. My VM was pretty much maxed out by this so there may have been some hardware limitations kicking in there.
So, in conclusion the Balanced Data Distributor is a good tool to help with those DQS performance issues.