DQS / SSIS Performance in SQL2012 RC0 – following some tweaks

As per the connect I raised around DQS / SSIS performance, the DQS / SSIS performance was pretty sluggish. The DQS team have come back with some more possible tweaks, and this is my test of those.

The key suggestions were:

  • Configure the DQS Server to accept bigger chunks of data from SSIS (see the connect for details of how to)
  • Increase the SSIS properties DefaultBufferMaxRows and DefaultBufferSize so bigger chunks can be sent by SSIS to the DQS server

I revised my original tests slightly, to change from 5 domains to 4 domains so that parallel processing would be counted properly (i.e. when splitting things up it would process the 4 domains in parallel to compare fairly to the single process – as opposed to doing 4+1, as the maximum parallel tasks my processing will allow is 4).

The tweaks I made was setting the DQS Server Chunk size to 100,000, DefaultBufferMaxRows to 104857600 (10 times the default) and DefaultBufferSize to 100,000 (also 10 times the default). This meant in my sample data set of 10,000 rows everything was passed through in one big chunk.

So to remind of my test approach, I did it 3 ways (to allow for previous performance tweaks mooted by the DQS team):

  • All in one pass (4 domains in a single component)
  • Separate domains (process each domain in a separate path)
  • Split data (process the domains in 4 x 2,500k chunks)

The results of testing

Well… it helps. But there’s still a big gap between what can be achieved in the DQS client and what you can get via SSIS. The results are below, comparing the Tuned approach to the Untuned approach, and also the DQS client directly:

DQS Performance with SSIS
DQS Performance with SSIS

I’ve deliberately skipped out the actual numbers because I’m running on a clunky VM, and since I’ve read that Stephen Few book, I’ve learned a few things about conveying messages via data visualisation.

So, here’s the takeaways:

  • The new tuning approaches makes a difference – it’s about 3 times faster
  • The old tuning approaches still make negligible difference
  • Even tuned, SSIS is 5 times slower than the client

I’ll be feeding this all into the DQS team… and see what they come up with next. It would be nice if we can reach a point where performance is comparable with the client.

2 thoughts on “DQS / SSIS Performance in SQL2012 RC0 – following some tweaks

  1. I appreciate the article, as it has helped me get the SSIS dqs cleansing component to process much faster for me. Wanted to note (as it slowed me down and therefore might hinder someone else) that the query to update DQS Server is:
    USE [DQS_MAIN]
    GO

    DECLARE @newDCChunkSize INT = 100000

    UPDATE [dbo].[A_CONFIGURATION]
    SET [VALUE].modify(‘replace value of (/configuration/param[@name=”DCChunkSize”]/@value)[1] with sql:variable(“@newDCChunkSize”) ‘)
    WHERE [Name] = ‘DataQualityParameterFlow’
    GO

    And also note that DefaultBufferMaxRows and DefaultBufferSize are properties of the Data Flow task in which one is running the DQS Cleaning transform. I mistakenly thought these properties were on the DQS transforms.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>