SQL Server Data Quality Services in SQL2012 RC0 – Part 1
So the key news – in case you missed it – is that SQL2012 RC0 has been made available for download. After a few battles with the Installer – first the known issue with the Distributed Replay users – then some things requiring manual installs of KB’s to get the installer to run through – I have a VM set up with it.
The DQS team have posted about the improvements made in the DQS blog – and the one I really wanted to focus on was performance via SSIS as the CTP3 offering was not viable for large data sets. So this Part 1 post is all about the performance of DQS via SSIS in RC0.
So, I set up a Knowledge Base in the same way as I did for testing CTP3, with 5 duplicate domains – just evaluating an Integer with a single rule saying that integer had to be greater than a value to be valid. Then I ran two sets of values (5k & 10k rows) through the KB via SSIS, evaluating 1,2,3,4 and 5 fields.
So how does DQS Perform?
Here’s the results- the value in the grid is Seconds taken to process.
So – have we moved on from CTP3? A bit. But not much, and enough to be accounted for by a different VM setup (as a reminder CTP3 processing 5k rows took from 20 to 45 seconds for 1-5 columns). I accept a VM may be slower than a properly configured server, but even if it was twice as quick it would still not be a viable option for industrial use.
Looking at execution time changes by number of columns / rows processed, the time taken seems to be pretty linear as rows and columns increase, so it appears DQS performance can be evaluated pretty much as:
DQS Execution Time = Spin Up Time + (Columns * (Rows * Row Process Time))
Where:
Spin Up Time = time taken for DQS engine to start (Constant)
Columns = number of columns being evaluated (Variable)
Rows = number of rows being processed (Variable)
Row Process Time = time taken to process a single row (Constant)
On my VM, Spin Up Time seems to be 7 seconds, and Rows Process Time = 0.0014 seconds.
So, if we had to validate 10 columns on 1,000,000 rows of data (not too crazy) -
DQS Execution Time = Spin Up Time + (Columns * (Rows * Row Process Time))
DQS Execution Time = 7 + (10 * (1,000,000 * 0.0014))
DQS Execution Time = 7 + (10 * (1,000,000 * 0.0014))
DQS Execution Time = 14007 seconds = 233 minutes = 3.9 hours
Which effectively rules it out as a viable production process. Note of course that my formula doesn’t make any allowance for rule complexity.
Is DQS Production ready?
As per anything, the answer is – It depends. For validating small data sets it’s in the realms of slow, but probably acceptable. For big data sets, I’d have to say no – I couldn’t use it in a production environment to validate large sets of data. I’ve added a Connect suggestion to get this on the teams radar.

how to install the DQS component. where it is available and how to download all the details needed for me.
Just download SQL2012 RC0 (first link in the post) and it’s all contained in there.