The Derived Column Transformation

Fig 1: The Derived Column Transformation
Fig 1: The Derived Column Transformation

In this post I will be covering the Derived  Column Transformation. The sample package can be found here for 2005 and guidelines on use are here.

What does the Derived Column Transformation do?

The Derived Column Transformation provides a means to change column data as it passes through the data flow. It uses the SSIS Expression Language to transform the data and allows you to replace an existing column’s value or create a new column which can use values from other columns or variables to create a new data item.

b
Fig 2: Configuring the Derived Column Transformation

In the Derived Column editor, there are three panes to work with. The top left pane has two folders, one for Variables and one for Columns. The top right pane contains folders for the various functions available. The bottom pane is where you define the Derived Columns. Both the top panes have drag and drop functionality, so you can click and drag a function, column or variable into the Expression area of the Derived Column configuration pane.

Setting up a Derived column is straightforward – if adding a new column, provide a name for it in the Derived Column Name column. If replacing the content of an existing column, select its name in the Derived Column dropdown, and the Derived Column Name will lock down to the name of the column being replaced – in the example above see the ListPrice column for an example.

The next part is the Expression itself. For details of the language for the expression, see my posts on the  SSIS Expression Language. The options you have are basic expressions, such as ROUNDing numbers, YEAR functions for dates and SUBSTRING for strings. Two things that I will raise are NULL Functions and Type Casts. Because SSIS is so strict about data typing, if you evaluate or return a Null in your expression, you have to pull a Null from the Null functions list – for example if assessing if a Date value is null, you would need to use the Null function NULL (DT_DATE). Similarly Type Casts convert data to a specific type, so for example if you wanted to put a string source type value into a float column, you would have to convert it using the Type Cast (DT_R8) [Sting Column].

The Data Type, Length, Precision, Scale and Code pages are determined automatically from the type of data you are working with – note that you cannot change the data type of an existing column. The only room you really have to move here is to change string lengths and code pages, or precision and scale for numerics when you create a new column.

In the sample package I show a few simple examples of Column manipulation, using column on column, variable on column, pure variable and pure column operations.

What are the Derived Column Transformation’s limitations?

My biggest problem with the Derived Column Transformation is that the function list is small, and worst of all, fixed. In a rare example of Cognos Data Manager being better than SSIS, Data Manager allows for the creation of custom functions that can then be re-used. SSIS offers no such flexibility, which means if you have complex operations that need to be done repeatedly (e.g. Trimming & Nulling incoming strings to clean input data) you can’t create a custom function to simplify the operation and make the operation reuseable. I’ve added a Feature Suggestion on Connect to request an package level extensible function library – please vote for it if you agree this is a big hole in the component. (Update 18 Aug 2010: Microsoft have responded, closing it and saying that “it does not meet the bar to be included in an upcoming product release”, which is a bit disappointing)

One thing which trips up a few people is that you cannot use the result of one derived column in another derived column within the same transformation. The logic behind this is pretty simple – each column is treated as a separate independent item within the data flow and can only consume columns that are input to the transformation. A derived column is effectively an output and so cannot be referenced within the component it was created in.

Lesser gripes are that the Editor is too small. There is no call out box like when setting component properties using expressions, so complex expressions quickly become difficult to read and debug. Add to that the only way to get the syntax error messages is to hover over the function that is invalid and try to read the note that appears for about 4 seconds, meaning the only real way to read long error messages is to hover over it with the mouse and do a screen capture.

Where should you use the Derived Column Transformation?

It is best used when you have to perform simple operations to change data values – for example TRIMming strings, simple IF statements and SUBSTRINGs. It also reproduces all of the functionality in the Audit and Data Conversion tasks, so if you are using those anywhere you may want to consider replacing them with a Derived Column. If your expressions are getting complicated or you repeat alot of operations, you may want to move these operations to the uglier but more powerful Script Component, which I will be covering soon.

MSDN Documentation for the Derived Column Transformation can be found here for 2008 and here for 2005.

If you are still struggling, try these additional resources:

If you need specific help or advice, or have suggestions on the post, please leave a comment and I will do my best to help you.

5 thoughts on “The Derived Column Transformation

  1. Hi,

    In my derived column I am trying to replace the column with expression which only going to change the length of column as per the destination. it’s work if I choose add as a new column but it’s wont change the length of field if I choose Replace as a derived column :

    (DT_STR,10, 1252) [Table_ColumnName]

    Does it’s doesn’t allow to change the length as the replace column ?

    Thanks,
    Zaim Raza.

  2. Oh, and heaven help you if you change the name of a column later!

    I have used a derived column transformation as input to an unpivot transformation, and the output of the unpivot is fed into another derived column transformation. When I change the name of a column in the first derived column transformation, the name change DOES NOT PROPAGATE through the unpivot transformation and on into the downstream, second derived column transformation. So, I am condemned to adding a new column first, with the new name, and then removing the old column! And what’s worse, there are no warnings, either. The old column name still hangs around in the second derived column transformation, until you delete it.

    So, editing the derived column name, in situ, is not going to work: you must first add a new column (which is intended to replace another column) and then delete the old one.

    Hope that made some sense. I’m pretty frustrated. My previous experience was with Ardent DataStage (yeah, back before it became Informix DataStage, and then morphed into IBM DataStage, I think) and I don’t recall such problems in that tool, but it was a while ago.

    Hopefully this helps another beleaguered SSIS programmer out there.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>