As per the connect I raised around DQS / SSIS performance, the DQS / SSIS performance was pretty sluggish. The DQS team have come back with some more possible tweaks, and this is my test of those.
The key suggestions were:
- Configure the DQS Server to accept bigger chunks of data from SSIS (see the connect for details of how to)
- Increase the SSIS properties DefaultBufferMaxRows and DefaultBufferSize so bigger chunks can be sent by SSIS to the DQS server
I revised my original tests slightly, to change from 5 domains to 4 domains so that parallel processing would be counted properly (i.e. when splitting things up it would process the 4 domains in parallel to compare fairly to the single process – as opposed to doing 4+1, as the maximum parallel tasks my processing will allow is 4).
The tweaks I made was setting the DQS Server Chunk size to 100,000, DefaultBufferMaxRows to 104857600 (10 times the default) and DefaultBufferSize to 100,000 (also 10 times the default). This meant in my sample data set of 10,000 rows everything was passed through in one big chunk.
So to remind of my test approach, I did it 3 ways (to allow for previous performance tweaks mooted by the DQS team):
- All in one pass (4 domains in a single component)
- Separate domains (process each domain in a separate path)
- Split data (process the domains in 4 x 2,500k chunks)
The results of testing
Well… it helps. But there’s still a big gap between what can be achieved in the DQS client and what you can get via SSIS. The results are below, comparing the Tuned approach to the Untuned approach, and also the DQS client directly:
I’ve deliberately skipped out the actual numbers because I’m running on a clunky VM, and since I’ve read that Stephen Few book, I’ve learned a few things about conveying messages via data visualisation.
So, here’s the takeaways:
- The new tuning approaches makes a difference – it’s about 3 times faster
- The old tuning approaches still make negligible difference
- Even tuned, SSIS is 5 times slower than the client
I’ll be feeding this all into the DQS team… and see what they come up with next. It would be nice if we can reach a point where performance is comparable with the client.