SQL Server Data Quality Services & SSIS
So far in my posts on SSDQS we’ve looked at the Data Quality Services Client and building SSDQS Knowledge Bases. Now in practice when handling bulk data a need to reference this in routine loads is needed, and to nobody’s surprise, SSIS is the tool for the job.
The DQS Cleansing Component
So, in our (shiny, new) SSIS Toolbox we have a new component to connect to DQS – the DQS Cleansing Component:
The DQS cleansing component pushes a data flow to the DQS Engine for validation. This requires a special Connection Manager, the DQS Cleansing Connection Manager, which as we can see below is a simple creature:
The sole option at this point is to choose which DQS Server to point at. So, lets look at what we get in the SSIS Component once we use the Connection Manager:
Once again – still nice and simple – choosing your Connection Manager allows you to then pick from a list of Published Knowledge Bases. Once a KB is selected, a list of the available Domains is populated, though there is nothing you can do with this list other than review it. So next we move to the Mapping tab:
The usual suspects are there – pick your input columns in the top half of the tab and they become available for mapping in the lower half. Each input column can be mapped to a single Domain (I can’t quite see how Composite Domains work in this context). You then get three output streams – the Output, Corrected Output and Status Output. The Output is just the column passed through, Corrected is the column value corrected by the DQS Engine and the Status is the record status (which comes out as Correct, Corrected or Unknown which corresponds to the DQS Data Quality Project statuses. In the Advanced Editor you can also switch on Confidence and and Reason Outputs, which relate to matching projects.
Note that there is only a single output for the DQS Cleansing Component – if you want to send OK, Error and Invalid records to different locations, you will need to do so with a downstream Conditional Split component.
So we’ve had a quick look at the basics of automating DQS activities using SSIS, and how SSIS plugs in to the DQS Server. Subsequent posts will start digging into some practical implementation including performance.
Some further reading can be found here:
- Using the SSIS DQS Cleansing Component – from the DQS Blog