SQL Server Data Quality Services & SSIS

So far in my posts on SSDQS we’ve looked at the Data Quality Services Client and building SSDQS Knowledge Bases. Now in practice when handling bulk data a need to reference this in routine loads is needed, and to nobody’s surprise, SSIS is the tool for the job.

The DQS Cleansing Component

So, in our (shiny, new) SSIS Toolbox we have a new component to connect to DQS – the DQS Cleansing Component:

SSIS DQS Cleansing Component

SSIS DQS Cleansing Component

The DQS cleansing component pushes a data flow to the DQS Engine for validation. This requires a special Connection Manager, the DQS Cleansing Connection Manager, which as we can see below is a simple creature:

SSIS DQS Cleansing Connection Manager

SSIS DQS Cleansing Connection Manager

The sole option at this point is to choose which DQS Server to point at. So, lets look at what we get in the SSIS Component once we use the Connection Manager:

SSIS DQS Cleansing Component Connection Manager options

SSIS DQS Cleansing Component Connection Manager options

Once again – still nice and simple – choosing your Connection Manager allows you to then pick from a list of Published Knowledge Bases. Once a KB is selected, a list of the available Domains is populated, though there is nothing you can do with this list other than review it. So next we move to the Mapping tab:

SSIS DQS Cleansing Component Mapping Tab

SSIS DQS Cleansing Component Mapping Tab

The usual suspects are there – pick your input columns in the top half of the tab and they become available for mapping in the lower half. Each input column can be mapped to a single Domain (I can’t quite see how Composite Domains work in this context). You then get three output streams – the Output, Corrected Output and Status Output. The Output is just the column passed through, Corrected is the column value corrected by the DQS Engine and the Status is the record status (which comes out as Correct, Corrected or Unknown which corresponds to the DQS Data Quality Project statuses. In the Advanced Editor you can also switch on Confidence and and Reason Outputs, which relate to matching projects.

Note that there is only a single output for the DQS Cleansing Component – if you want to send OK, Error and Invalid records to different locations, you will need to do so with a downstream Conditional Split component.

Summary

So we’ve had a quick look at the basics of automating DQS activities using SSIS, and how SSIS plugs in to the DQS Server. Subsequent posts will start digging into some practical implementation including performance.

Some further reading can be found here:

About BI Monkey

Comments

One Response to “SQL Server Data Quality Services & SSIS”

Trackbacks

Check out what others are saying about this post...


Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!