SSIS Command Line Utilities part 2: dtexec

In my previous post on dtutil, I looked at the command line utility to move and alter SSIS packages. In this post I will be looking at dtexec, the command line utility that executes packages. It allows for significant alteration of run-time options, including configured values, connections, logging and so on.

dtexec: Basic Package Execution

Like dtutil, dtexec uses a similar set of basic options to identify where the package you are operating on is located: /File, /DTS & /SQL. The eagle eyed among you will note that while the full option name is the same the abbreviated is slightly different, though their operation is the same. There is also a slight difference in the /SQL additional options – because no movement of the package is taking place, you only need to specify the /Server, /Password and /User for the package being executed (rather than separate identification of the Source and Destination package). Similarly there is a /Decrypt option for providing the password to encrypted packages. For details on this, check either my previous post or MSDN documentation.

There are two important options around basic execution. First is the /X86 option. This forces SQL Agent to to execute the package in 32-bit mode on a 64-bit server, which can be important in scenarios where 64-bit compatibility isn’t available. This commonly is an issue with drivers – however this is ignored when running from the command line. When running scenarios where you have SSIS servers in your environment running in 32 and 64-bit you probably want to read this MSDN article first: 64-bit Considerations for Integration Services.

The second is the the option to /Validate the package instead of executing it. What this does is halt the package after the validation phase – this can be useful for testing in deployment scenarios. How I sometimes use it is following deployment with a dtutil script, I run a dtexec script to validate the deployed packages, which helps catch any issues with missing / incorrect configurations etc. before you do a test run. This can be used in conjunction with the /W[arnAsError] option, which will treat a warning as an Error and cause validation to fail.

As a side note, you can place comments in the dtexec command line using the /Rem option.

dtexec: Configuring Execution

Execution can be configured using a number of options. First up is the ability to get dtexec to reference a file with all your dtexec options set in it, using the /Com[mandFile] {filespec} option. This can be handy if you are in an environment where you call dtexec via a remote tool such as Control M that has a character limit in terms of the batch command it can send.

You can provide or override the configurations used in execution using the /Conf[igFile] {filespec} to point at an XML configuration file. Similarly you can override specified connection settings using the /Conn[ection] id_or_name;{connection_string} option – multiple instances of this option are allowed and you can either point at a connection name or the GUID of the connection. You can associate an SSIS logging type with package execution using the /L[ogger] classid_orprogid;{configstring} option – this could be useful in troubleshooting if you don’t want logging enabled during normal execution but want to be able to turn it on for debug.

Individual Package properties such as variable values can be /Set – and you can set as many as is required. Finally you can set the Maximum number of executables permitted using the /M[axConcurrent] {concurrent_executables} option.

dtexec: Using Checkpoints

dtexec can make a package override its defined checkpoint usage settings – though as i’ve alluded to before, be very careful when using checkpointing as you won’t always get the expected results. But the options available are to turn /CheckPointing {on\off}, specify the actual file using /CheckFile and specify the usage of Checkpoints at package start using /Restart {deny | force | ifPossible}. These options match the settings in the package of SaveCheckpoints, CheckpointFile and CheckpointUsage.

dtexec: Execution Logging and Reporting

When running packages from dtexec, you get some additional logging options which can be useful when you are trying to do a more detailed debug on a package. Some are useful for immediate visualisation in the console window as you execute while others create file based logs.

The first console option is /ConsoleLog, which allows you to write to the console window the same entries that would normally be captured by SSIS execution logging. You can select which of these columns you want to capture, the most important of these likely to be Message. Further fine tuning in the settings for this option allow you to tighten the logging to a specific task or event – again as per standard logging. Next is /Reporting  level, which tunes what type of event to capture – Errors, Information, Warnings and so on. Usually when you want this level of reporting i’d advise using the Verbose switch to just capture everything. There is also the capability to create an event exclusion list for this option. Finally, /Sum will put a count of the number of rows the next component will receive in the log.

If you want to push all this logging out to a file (and the logs can get pretty big so they can blow out the console window pretty quickly) you can write them out using the /VLog option. If you need to venture further into debugging you can create the dump files .mdmp and .tmp using either the /DumpOnError option to dump on any error or /Dump with a specified error to only generate the files under certain circumstances. These are advanced options and to understand the output i’d advise reading the MSDN article Working with Debug Dump Files first.

dtexec: Package Verification

The dtexec utility offers a few options to verify the package you are pointing at is the one you intend to execute. First up is the option to verify the build versions using the /VerifyBuild option which allows you to tell dtexec to only run the package if it has the specified Major (and optionally Minor) build version. These are the values set at the control flow level in the SSIS package you are calling. You can check the package has a valid digital signature using /VerifySigned option. Finally are the options to match the GUID of the package you are executing using /VerifyPackageId and the version GUID of the package using the /VerifyVersionID option.

All of the options above are interesting additional protections to ensure your system is running the code that it should be, but realistically they are poor and fiddly substitutes for a proper release management process.

Why use dtexec?

Most of the options above are of course covered in SQL Agent execution options, but there are scenarios where dtexec can prove useful – most commonly where SQL Agent isn’t available or isn’t permitted to be used by IT policy. It also has value in deployment and testing scenarios. Most of the time I would advocate using SQL Agent to schedule and automate your jobs, but it’s good to have such a flexible tool available.

There is also a GUI option, dtexecui.exe which allows for the configuration of almost all the above options using a simple GUI. It also has the advantage of being able to generate the dtexec command line prompt as well, making syntax errors much more avoidable.

Official MSDN documentation can be found here for 2008 and here for 2005.

AUSSUG Upcoming Sessions

In case you aren’t in the Australian SQL Server User Group, AUSSUG, there are a few upcoming sessions in Sydney which will be pretty useful – check out the official site to register. Sessions are free and always useful.

Lunchtime Wed 3rd March, 2010 – Ensuring Optimal Performance in SQL Server 2008 Based Applications with Viktor Isakov

Evening * Thu * 4th Mar 2010 – What’s new in Reporting Services 2008 R2 and PerformancePoint Services 2010 with Peter Myers, presenting what’s new in the upcoming release of Reporting Services 2008 R2 and PerformancePoint Services 2010

And TBA date in April 2010 – Knights of the SSIS Round Table – Kevin Wong, Glyn Llewelyn and myself will be presenting a series of mini demos followed by Q&A, so a chance to pick some expert brains

Hope to catch you at one of the sessions!

SSIS Command Line Utilities part 1: dtutil

I am back on the study wagon, getting ready for Exam 70-448: TS: Microsoft SQL Server 2008, Business Intelligence Development and Maintenance. One topic that is key for is understanding the command line utilities that allow you to carry out various functions across all the BI technologies. Possibly in excessive depth, but no-one can accuse the MS BI certifications of being easy.

There are two that impact SSIS – dtutil and dtexec. In simple terms, dtutil moves packages, and dtexec executes them. In this post I will be focussing on dtutil. dtexec will be covered in a later post.

dtutil: deploying packages

The main use of dtutil is to script the deployment of packages. For file operations it is equivalent to a simple copy operation, though you can add SSIS specific operations, such as changing the package GUID. However it also allows for movement to and from the SQL Server msdb store, which can’t be done via copy / paste.

Deploying a package requires you specify its source location using one of the following options:

  • /Fi (or /File) + File Path + Package Name.dtsx – for packages in the File System
  • /DT (or /DTS) + Package Name – for packages in the SSIS Package Store
  • /SQ (or /SQL) + Package Name – for packages in msdb

There are a few important things to note about the above options. Firstly for File System operations, you need to specify the .dtsx extension. Secondly, for SSIS Package store operations you don’t need to, even though it is still strictly speaking a file system operation. Finally, you don’t need to know where the Package Store is – dtutil works that out for you, which is handy if you have custom setups of the package store.

For packages located in msdb, you may also need to provide connection details using the following options:

  • /SourceS (or /SourceServer) – the source server name. If not used, it assumes localhost
  • /SourceU (or /SourceUser) – the SQL Authentication user name
  • /SourceP (or /SourcePassword) – the SQL Authentication Password

Note that the username and password options are only required if SQL Authentication is being used. If left out, dtutil will try and authenticate to msdb using Windows Authentication for the current user. 

Next up, you need to tell dtutil to copy the package and specify the destination. The option to move it is /C or (/Copy). This is then followed by the same options as specifying the source - though without the preceding backslash (i.e. Fi, DT or SQ) – then a semicolon, and finally the destination path. If you are using a SQL Server destination you may also need to provide connection details with the /DestS, /DestU and /DestP options.

As usual, an example is worth its weight in gold, so heres some to help you understand. In each case I will give the Full and Abbreviated versions – both of which do exactly the same job.

Moving a package from File to SSIS Package Store:

dtutil /File C:\Package.dtsx /Copy DTS;Package

dtutil /Fi C:\Package.dtsx /C DT;Package

Moving a package from msdb on a Named Instance of SQL Server using SQL Server Authentication to the file system

dtutil /SQL Folder\Package /SourceServer SQLSERVER.INSTANCE /SourceUser Monkey_User /SourcePassword P@$$word /Copy File;C:\Package.dtsx

dtutil /SQ Folder\Package /SourceS SQLSERVER.INSTANCE /SourceU Monkey_User /SourceP P@$$word /C FI;C:\Package.dtsx

Using a little batch scripting these can easily be converted into jobs which will deploy a bundle of packages quickly and easily.

dtutil: altering packages

dtutil is more than a deployment tool – it can also alter packages using the following options:

  • /Encrypt – copies and encrypts the package with a specified password and protection level
  • /IDRegenerate – Generates a new GUID for the package
  • /Sign  – Applies a digital signature to a package

The last two can be useful as options within the deployment process, where the Encrypt can replace the Copy operation if you want the deployed environments packages to be more secure.

dtutil: file and folder operations

Finally, dtutil can work components of the file system or MSDB folder structures, creating and deleting files and folders

  • /Delete – Delete a package
  • /Exists – Test if a package exists
  • /FCreate – Create a folder
  • /FDelete – Delete a folder
  • /FDirectory – List the contents of a folder
  • /FExists – Test if a folder exists
  • /FRename – Rename a folder
  • /Move – Move a package from one location to another

Most of these would be useful in an initial deployment or for tidying up.

dtutil: other things

There are a few leftover functions which I don’t have a categorisation, so here they are.

  • /Quiet – suppress prompts
  • /Remark – add a comment
  • /Decrypt – decrypt a package as an operation is performed

Why use dtutil?

The main reason for using dtutil is simply that the BIDS supplied deployment options, and all that mucking about with Deployment Manifests, is limited, not as scriptable, and also has been buggy, having problems with configuration files (though I believe this is now fixed in 2008, but since I stopped using it long ago, I can’t be sure). dtutil also offers additional functionality, such as being able to script package operations such as encryption.

Official MSDN documentation can be found here for 2008 and here for here for 2005.

BIDS Helper

When you are a BI Monkey like me, you probably spend a fair bit of your time buried deep in Business Intelligence Development Studio (BIDS) and probably wishing it was a little less frustrating to work with.

Enter… BIDS Helper

Over at codeplex there is a project called BIDS Helper, which adds some useful additional functionality to the standard BIDS environment.  It improves the development environment for integration, analysis and reporting services.  In this post I will focus more on the SSIS capabilities that are added.  The additional functionality is fully documented on the codeplex site, and I will highlight some of the more useful functions below – the links are to the documentation which is nice and visual, making it easy to understand.

Expression and Configuration Highlighter - this adds a highlight to components, connection managers and variables that are controlled by an expression and / or configuration, which allows you to rapidly see what is subject to change at runtime.

Expression List – this lists every property that is set by a variable (though currently not expressions that set the value of variables) – great for finding out what is actually in the package, as these things can be easily overlooked otherwise

Non-Default Properties Report – this lists any property in the package that is set to a non-default value. Very handy for debugging and seeing what is non standard.

SSIS Performance Visualization- this executes the package and presents a gantt chart of the execution time, and also can present performance over time – useful when performance tuning.

Variables Window Extensions - this adds the functionality of being able to change a variables scope – very handy for when you create a variable at the wrong scope (something I do all the time).

BIDS Helper is a handy little tool that adds a few nice functions for SSIS developers without intefering with the existing UI or design environment – I thoroughly recommend installing it, especially as it’s free!

Fuzzy Thinking

I’ve covered off the Fuzzy Lookup and Fuzzy Grouping transformations in SSIS and noticed in my research that these capabilities aren’t particularly coherently talked about on the web. So below I thought i’d collect some of the better articles for your late night reading. There isn’t all that much out there, unfortunately.

So, how does it all work?

Here are a few articles covering theory, mostly from Microsoft:

It is probably worth reiterating that because of the way the algorithms and their Q-Grams work, when longer strings are being analysed for fuzzy matches, the better the chances of a good match. When I first started using the algorithms I was doing some client matching and matched first and last names separately. Once I had a deeper understanding of the components, I started matching on a full name and the quality and reliability of matches improved significantly.

Ok, so how do I make it work?

Now, some articles covering practical implementation of the tasks:

The best thing you can do is get some sample data and play with the components to understand what it is they do. The results are impressive – if not bulletproof – and can make a great contribution to de-duplicating client data, etc.

And what does the BI Monkey have to say about it?

Fuzzy Matching is a powerful and easy to use tool which is great for approximate grouping of data for analysis where a margin of error is tolerable. It is also a great helper in data cleansing exercises. Having too much faith in the results where exact matches are required will cause you to fall over at some point, so be careful. If you are engaged in such an exercise and want some experienced support, please get in touch.

And in other news, fresh from Jamie Thompson – Fuzzy Lookup and Regex are going to become available in SQL2008R2.

If you have come across any articles that you think really contribute something to the understanding of fuzzy matching technologies in SQL Server / SSIS, please let me know or post a link in the comments so I can improve this article.

AUSSUG: Integration Services 2008 new features – Tues 10 Nov 2009

One for those Sydney resident BI Monkeys – Kasim Wirama, SQL MVP is presenting to the Sydney Chapter of the Australian SQL Server User Group on new features in SSIS for SQL 2008.

Please register at the AUSSUG website if you plan to attend – sessions are free. I expect to be there, finally able to attend for the first time in months, having missed some very interesting sounding sessions. The presentations are always useful and a great chance to meet with some of the SQL Server Gurus in Sydney.

Full details below:

Venue: Mechanics’ School of Arts, Level 1, 280 Pitt St, Sydney

Duration
5:30PM Networking (Pizza & Drinks provided)
6:00PM – 8:00PM Session

On: Tuesday, 10 November 2009

Session Details
====================

SQL Server Integration Services (SSIS) is prime ETL engine of SQL Server 2005/2008.
At this session, I would like to show new feature and improvement in SSIS 2008 and
presents practical SSIS solutions, i.e.:

* Referenced resultset
* Branching in control flow and data flow
* Centralized SSIS package configuration and logging
* Multi column output with Oledbcommand component
* Streaming (high performance) sequence number
* Dynamic source file connection with much less code than former DTS does
* Lookup improvement in SSIS 2008

The Fuzzy Grouping Transformation

Fig 1: The Fuzzy Grouping Transformation

Fig 1: The Fuzzy Grouping Transformation

In this post I will be covering the Fuzzy Grouping Transformation. The sample package can be found here for 2005 and guidelines on use are here.

What does the Fuzzy Grouping Transformation do?

The Fuzzy Grouping Transformation allows you to identify similar items within a dataset. It doesn’t – as its name perhaps suggests – actually perform any group by operations. However it does provide you with the information you need to group rows by, as what it does is help identify similar rows within a dataset. It uses the same algorithms as the Fuzzy Lookup transformation, and operates a bit like a Fuzzy Lookup except that it effectively creates its own runtime lookup table, based on the incoming data, which it then references to identify similar rows.

If this is all getting a bit confusing, let’s take a look at a example of some output:

Fuzzy Grouping Output

Fig 2: Fuzzy Grouping Output

This shows three input rows with the field FullName in which the Fuzzy Grouping transformation has been told to look for similar values. These three rows have all been decided to be similar to the FullName “Gregory Alderson”. Note that – as with the Fuzzy Lookup – the match is assigned a similarity score (in the column _Similarity_FullName). Input rows are also assigned a unique key (in the column _key_in) and when a row is matched to another, the key of that row is also stored (in the column _key_out). Note the Fuzzy Lookup component is a synchronous transformation – i.e. you will get one output row for each input row.

So what has actually happened here?

  • First, the component has loaded all the data into temporary tables.
  • Second, it scanned through using fuzzy algorithms to look for similar items.
  • Third, for each input row it output either the best match above the threshold similarity, or otherwise decided it had no matches.

You can then use this output to perform an aggregate group by operation to sum up your data by similar items, using the Aggregate Transformation.

Configuring the Fuzzy Grouping Transformation

The Fuzzy Grouping Transformation is configured over three tabs when you open the component. The first thing you need to configure is the database connection it will use to create the temporary tables it needs to perform the fuzzy matching on the Connection Manager tab. These will be built in TempDB, so ensure the user specified in your connection manager has appropriate permissions on TempDB.

Configuring the Connection Manager for the Fuzzy Grouping Transformation

Fig 3: Configuring the Connection Manager for the Fuzzy Grouping Transformation

Next, on the Columns tab, select the columns that are going to be used to identify similar rows, and those which will simply pass through the component in the ‘Available Input Columns’ box. If a column is to be grouped, check on the left hand side. If it is simply to pass through just check on the right hand side. Any column selected for grouping will be passed through automatically.

Configuring the Columns used in the Fuzzy Grouping Transformation

Fig 4: Configuring the Columns used in the Fuzzy Grouping Transformation

For each column to be analysed for similarity there are a range of settings that influence how the matching occurs and how some output is named. These work as follows:

  • Input Column – the column selected with the checkbox for grouping
  • Output Alias – the name of the Input column in the Output data flow
  • Group Output Alias – the column which will hold the best match value from the grouping
  • Match Type – Exact or Fuzzy matching
  • Minimum Similarity – The minimum similarity for matches for that column
  • Similarity Output Alias – the name of the output column containing the similarity score for that column
  • Numerals – tunes how the matches handle numbers in the data
  • Comparison Flags – fine tuning the string handling

Full details for how these settings work – especially the last two – can be found here on MSDN. I would expect the most common ones you would play with would be Match Type – when grouping on multiple columns and are confident some of them have good quality data, setting to exact match will improve the overall result for matching a row of data. Similarly with Minimum Similarity, you can tune for the quality of match you will accept on individual columns.

Note you are not limited to a single column for grouping – you can select several which will contribute to the overall similarity score for the row. In the sample package I demonstrate this in data flow 2.

Configuring the Advanced options for the Fuzzy Grouping Transformation

Fig 5: Configuring the Advanced options for the Fuzzy Grouping Transformation

Finally, on the Advanced tab we set the global values that will affect the result you get. The key one is the Similarity Threshold – this is what will set the minimum threshold of match that will be accepted overall for a row to be regarded as similar to another row. The right setting for this value will depend on the data you have and what quality of match you will accept. The Similarity score this evaluates against is a function of the similarity scores of all the columns that are being evaluated for the match.

Where should you use the Fuzzy Grouping Transformation?

The scenario I can see this being most useful in is in an analytical context. From a data cleansing point of view it can be a quick way of assessing how effective a data deduplicating exercise may be, and what columns and parameters to use. From a purely analytical approach it would be useful for doing rough groupings on data where exact results aren’t important and a degree of error can be tolerated.

Be aware that this component is a blocking transformation – i.e. it requires all input data to be consumed and processed before it will output anything, which can result in poor performance.

MSDN Documentation for the Fuzzy Grouping Transformation can be found here for 2008 and here for 2005.

If you are still struggling, try these additional resources:

If you need specific help or advice, or have suggestions on the post, please leave a comment and I will do my best to help you.

The Import Column Transformation

Fig 1: The Import Column Transformation

Fig 1: The Import Column Transformation

In this post I will be covering the Import Column Transformation. The sample package can be found here for 2005 and guidelines on use are here.

What does the Import Column Transformation do?

The Import Column transformation is used to load binary data (photos, documents, media etc.) from the file system into a Data Flow. From the Data Flow it can then be loaded into a table, moved to a different location, modified – whatever you need to do. It is the reverse of the Export Column transformation, which pushes binary data out of a Data Flow and into the file system.

In my example I export product photo image thumbnails from the AdventureWorks database using the Export Column transformation, and then import them back in to a new table using the Import Column transformation.

Configuring the Import Column Transformation

The Import Column Transformation is another of those components that lacks a nice GUI, so all the configuration occurs in the advanced editor. In a similar manner to the Pivot, you need to create outputs that then have to be referenced by the inputs – though it is a bit simpler in this case. Just follow the steps below!

Firstly, select the input column that provides the file name and path of the binary data so the component can find the file it is going to import. This is simply a matter of selecting the appropriate column on the Input Columns tab, as below.

b

Fig 2: Selecting the Input Column

The second step is to create an output column to put the imported data into. Do this by switching to the Input and Output properties tab. Expand “Import Column Output”, select the Output Columns folder and click the “Add Column” button at the bottom of the form. Name the output (im my example below I called it “Imagefile”) and take note of the ID it is assigned (102 in my example).

b

Fig 3: Getting the ID of the new Output Column

You then have to expand “Import Column Input” and then the Input Columns folder. Select the input column that holds the file path of the binary data, and hidden down at the bottom of the properties list is the property “FileDataColumnID“. Update this property with the value you picked up from the Output Column that you just generated. This tells the output column where it is getting its binary data from.

b

Fig 4: Mapping the Output Column to the Input Column

This is everything you need to do to get the binary data into the Data Flow. Downstream from the Import Column transformation you manipulate the binary data using the Output Column you created within it.

Where should you use the Import Column Transformation?

This transformation would be useful for loading a database with images, documents of media files. I’ve not had call to use it yet so would be interested to hear of its practical applications.

MSDN Documentation for the Import Column Transformation can be found here for 2008 and here for 2005.

If you are still struggling, try these additional resources:

If you need specific help or advice, or have suggestions on the post, please leave a comment and I will do my best to help you.

Extending SSIS with custom objects

What is the best feature of SSIS? This could be a hot topic I’m sure, but for me the real killer feature is that your tool set is not limited to out-of-the box objects. Unlike other vendors, the SSIS black box isn’t completely closed – it’s more of a grey box. While you cannot alter the basic way SSIS operates, you can can get it to perform tasks or process data in the flow as you wish.

What kind of Custom Objects are there?

There are five categories of Custom Object – Control Flow Tasks, Data Flow Pipeline Objects, ForEach Enumerators, Connection Managers and Log Providers. This is not much of a surprise as those are the categories of standard objects. The most common of these are of the Control Flow and Data Flow objects. Control Flow Tasks allow you to perform activities in the Control Flow. For example Konesan’s File Watcher watches for the arrival of a file within a folder. Data Flow Pipeline objects function in the Data flow. An example of allowing for extra connectivity options is Cozyroc’s Informix Bulk Load destination. Pipeline activities are also included – such as Rémi Pestre’s Error Output Description which adds error code descriptions.

As you are probably beginning to realise, there’s a wide range of possibilities in terms of what Custom Objects can do.

When would I want to use a Custom Object?

There are two main scenarios where you would want to use a Custom Object:

  1. You need to do something standard objects cannot do – e.g. connecting to an Amazon S3 service
  2. You have a script object that is re-used in many different packages

The first option is fairly simple to understand. If a standard object won’t cut it, but what you want to do is possible, why not build one? I can give a personal example of this. During a standard DW build I had a sprawling set of lookups, derived columns and union alls to sequentially handle each lookup for a business key to retrieve the corresponding surrogate key. This meant I needed 3 objects (at least) to handle each key. On a table with a large number of keys the package rapidly became unwieldy. It was slow to code, messy to debug and not particularly efficient as each key was handled in sequence.

I managed to knock up some code which did all the lookups in parallel. Though the code was faster, it wasn’t a great deal better from a usability perspective. I turned to Cozyroc to turn this mangled process into a neat object. The end result was a single object that could handle all the lookups through a GUI. There was no need for a developer to go anywhere near the code. This saved huge amounts of development time and made managing the packages much easier.

The second scenario is more about code management and reusability. The only way to migrate code from package to package is to cut and paste the code and reconfigure the object for the new data flow. You may have a task that is repeated in many places. A common example of this is data cleansing rules that need to be applied to many columns in a flow. Where you do this in a script where the only variable is the columns being processed, this is a good candidate to convert to a object. As in my example above this will mean the SSIS  developers can get back to using a GUI and avoid time consuming hand coding.

Where are the risks in using Custom Objects?

There are a few practical considerations to take on board before packing your project with a fistful of custom objects. These have all been valid concerns raised by clients:

  1. Support - where do I turn if the object starts behaving unexpectedly or stops working after a patch is applied? Community objects will come with limited support, and certainly no guarantees. Some companies will not permit the use of unsupported software on their servers, especially for business critical implementations.
  2. Security - again, with 3rd party objects, how do we know the object is secure, or not transmitting interesting data somewhere it shouldn’t? Because it’s a black box a malicious coder could easily get the object to scan for interesting details. For example it could be looking for Credit Card numbers, and silently send them somewhere you would rather they didn’t go.
  3. Transparency - with the object being a little “black box” – how can users be sure it is doing things reliably and efficiently? A pretty GUI is no guarantee of good code or a solid testing process.

If the risk is perceived to be small enough, then open source objects are a great place to start – most of which are listed on Codeplex at the SSIS Community Tasks and Components page (maintained by Todd McDermid). However for Enterprise projects or scenarios where you want the comfort of expert support you will need to look to a 3rd party. Off the top of my head I am aware of a few suppliers – CozyRoc, Pragmatic Works, Konesans and /n software. CozyRoc, Pragmatic Works and /n all supply sets of paid for objects, and Konesans provide some free examples of their work. All of these companies will develop objects on demand (in exchange for payment, of course!). Alternatively, you can develop your own in-house and maintain it yourself as part of the project…

How do I make my own?

If you have a script object that you want to convert to a object or want to consolidate some functionality, you need to be able to code in a language that can make a .NET dll. Common languages used are C# and VB.NET. Then you need to work out how to build a object – a good place for those BI Monkeys who are code-minded would be to look at Todd McDermid’s series on the subject that starts with the post “Converting Your Script Task Into a Custom Task, Part I – Why?“. I’ve yet to try, largely because of my aversion to code, but expect to wander into this territory soon.

So how do I get started?

I suggest you get yourself over to Codeplex at the SSIS Community Tasks and Components page and pick something that looks simple – i’d suggest the FileWatcher or Pause tasks  – download and install them on your dev box, and see how easy it is!

I’d love to hear of peoples experiences with CustomObjects – good and bad. From both sides of the fence too – users and developers!

Thanks to Todd McDermid for his input on this post

SSIS Derived Column forcing data type changes

In SQL 2005, the Derived Column Transformation used to drive me nuts because any time you changed a string value, the component automatically changed the datatype to Unicode, even if it was originally string. This meant any time you changed an Expression that returned a non-unicode string, you had to change it back.

In SQL2008 it has got worse! Now the same change of data type is forced on you, but now the only way to change the data type back is to go into the Advanced Editor and amend the data type of the Output Column there, because – apparently by design – the data type is now locked in the normal UI.

Why by design? No idea, but according to this Connect article, that’s the way it’s going to stay – please vote this up and maybe Microsoft will revisit this incredibly annoying “feature”.

Update: I’ve also discovered that even if you drag a non-unicode column from the data flow into the Expression (nothing else, just the column) it still converts it to Unicode!

Next Page »