Extending SSIS with custom objects

What is the best feature of SSIS? This could be a hot topic I’m sure, but for me the real killer feature is that your tool set is not limited to out-of-the box objects. Unlike other vendors, the SSIS black box isn’t completely closed – it’s more of a grey box. While you cannot alter the basic way SSIS operates, you can can get it to perform tasks or process data in the flow as you wish.

What kind of Custom Objects are there?

There are five categories of Custom Object – Control Flow Tasks, Data Flow Pipeline Objects, ForEach Enumerators, Connection Managers and Log Providers. This is not much of a surprise as those are the categories of standard objects. The most common of these are of the Control Flow and Data Flow objects. Control Flow Tasks allow you to perform activities in the Control Flow. For example Konesan’s File Watcher watches for the arrival of a file within a folder. Data Flow Pipeline objects function in the Data flow. An example of allowing for extra connectivity options is Cozyroc’s Informix Bulk Load destination. Pipeline activities are also included – such as Rémi Pestre’s Error Output Description which adds error code descriptions.

As you are probably beginning to realise, there’s a wide range of possibilities in terms of what Custom Objects can do.

When would I want to use a Custom Object?

There are two main scenarios where you would want to use a Custom Object:

  1. You need to do something standard objects cannot do – e.g. connecting to an Amazon S3 service
  2. You have a script object that is re-used in many different packages

The first option is fairly simple to understand. If a standard object won’t cut it, but what you want to do is possible, why not build one? I can give a personal example of this. During a standard DW build I had a sprawling set of lookups, derived columns and union alls to sequentially handle each lookup for a business key to retrieve the corresponding surrogate key. This meant I needed 3 objects (at least) to handle each key. On a table with a large number of keys the package rapidly became unwieldy. It was slow to code, messy to debug and not particularly efficient as each key was handled in sequence.

I managed to knock up some code which did all the lookups in parallel. Though the code was faster, it wasn’t a great deal better from a usability perspective. I turned to Cozyroc to turn this mangled process into a neat object. The end result was a single object that could handle all the lookups through a GUI. There was no need for a developer to go anywhere near the code. This saved huge amounts of development time and made managing the packages much easier.

The second scenario is more about code management and reusability. The only way to migrate code from package to package is to cut and paste the code and reconfigure the object for the new data flow. You may have a task that is repeated in many places. A common example of this is data cleansing rules that need to be applied to many columns in a flow. Where you do this in a script where the only variable is the columns being processed, this is a good candidate to convert to a object. As in my example above this will mean the SSIS  developers can get back to using a GUI and avoid time consuming hand coding.

Where are the risks in using Custom Objects?

There are a few practical considerations to take on board before packing your project with a fistful of custom objects. These have all been valid concerns raised by clients:

  1. Support – where do I turn if the object starts behaving unexpectedly or stops working after a patch is applied? Community objects will come with limited support, and certainly no guarantees. Some companies will not permit the use of unsupported software on their servers, especially for business critical implementations.
  2. Security – again, with 3rd party objects, how do we know the object is secure, or not transmitting interesting data somewhere it shouldn’t? Because it’s a black box a malicious coder could easily get the object to scan for interesting details. For example it could be looking for Credit Card numbers, and silently send them somewhere you would rather they didn’t go.
  3. Transparency – with the object being a little “black box” – how can users be sure it is doing things reliably and efficiently? A pretty GUI is no guarantee of good code or a solid testing process.

If the risk is perceived to be small enough, then open source objects are a great place to start – most of which are listed on Codeplex at the SSIS Community Tasks and Components page (maintained by Todd McDermid). However for Enterprise projects or scenarios where you want the comfort of expert support you will need to look to a 3rd party. Off the top of my head I am aware of a few suppliers – CozyRoc, Pragmatic Works, Konesans and /n software. CozyRoc, Pragmatic Works and /n all supply sets of paid for objects, and Konesans provide some free examples of their work. All of these companies will develop objects on demand (in exchange for payment, of course!). Alternatively, you can develop your own in-house and maintain it yourself as part of the project…

How do I make my own?

If you have a script object that you want to convert to a object or want to consolidate some functionality, you need to be able to code in a language that can make a .NET dll. Common languages used are C# and VB.NET. Then you need to work out how to build a object – a good place for those BI Monkeys who are code-minded would be to look at Todd McDermid’s series on the subject that starts with the post “Converting Your Script Task Into a Custom Task, Part I – Why?“. I’ve yet to try, largely because of my aversion to code, but expect to wander into this territory soon.

So how do I get started?

I suggest you get yourself over to Codeplex at the SSIS Community Tasks and Components page and pick something that looks simple – i’d suggest the FileWatcher or Pause tasks  – download and install them on your dev box, and see how easy it is!

I’d love to hear of peoples experiences with CustomObjects – good and bad. From both sides of the fence too – users and developers!

Thanks to Todd McDermid for his input on this post

Read More