What does the Derived Column Transformation do?
The Derived Column Transformation provides a means to change column data as it passes through the data flow. It uses the SSIS Expression Language to transform the data and allows you to replace an existing column’s value or create a new column which can use values from other columns or variables to create a new data item.
In the Derived Column editor, there are three panes to work with. The top left pane has two folders, one for Variables and one for Columns. The top right pane contains folders for the various functions available. The bottom pane is where you define the Derived Columns. Both the top panes have drag and drop functionality, so you can click and drag a function, column or variable into the Expression area of the Derived Column configuration pane.
Setting up a Derived column is straightforward – if adding a new column, provide a name for it in the Derived Column Name column. If replacing the content of an existing column, select its name in the Derived Column dropdown, and the Derived Column Name will lock down to the name of the column being replaced – in the example above see the ListPrice column for an example.
The next part is the Expression itself. For details of the language for the expression, see my posts on the SSIS Expression Language. The options you have are basic expressions, such as ROUNDing numbers, YEAR functions for dates and SUBSTRING for strings. Two things that I will raise are NULL Functions and Type Casts. Because SSIS is so strict about data typing, if you evaluate or return a Null in your expression, you have to pull a Null from the Null functions list – for example if assessing if a Date value is null, you would need to use the Null function NULL (DT_DATE). Similarly Type Casts convert data to a specific type, so for example if you wanted to put a string source type value into a float column, you would have to convert it using the Type Cast (DT_R8) [Sting Column].
The Data Type, Length, Precision, Scale and Code pages are determined automatically from the type of data you are working with – note that you cannot change the data type of an existing column. The only room you really have to move here is to change string lengths and code pages, or precision and scale for numerics when you create a new column.
In the sample package I show a few simple examples of Column manipulation, using column on column, variable on column, pure variable and pure column operations.
What are the Derived Column Transformation’s limitations?
My biggest problem with the Derived Column Transformation is that the function list is small, and worst of all, fixed. In a rare example of Cognos Data Manager being better than SSIS, Data Manager allows for the creation of custom functions that can then be re-used. SSIS offers no such flexibility, which means if you have complex operations that need to be done repeatedly (e.g. Trimming & Nulling incoming strings to clean input data) you can’t create a custom function to simplify the operation and make the operation reuseable. I’ve added a Feature Suggestion on Connect to request an package level extensible function library – please vote for it if you agree this is a big hole in the component. (Update 18 Aug 2010: Microsoft have responded, closing it and saying that “it does not meet the bar to be included in an upcoming product release”, which is a bit disappointing)
One thing which trips up a few people is that you cannot use the result of one derived column in another derived column within the same transformation. The logic behind this is pretty simple – each column is treated as a separate independent item within the data flow and can only consume columns that are input to the transformation. A derived column is effectively an output and so cannot be referenced within the component it was created in.
Lesser gripes are that the Editor is too small. There is no call out box like when setting component properties using expressions, so complex expressions quickly become difficult to read and debug. Add to that the only way to get the syntax error messages is to hover over the function that is invalid and try to read the note that appears for about 4 seconds, meaning the only real way to read long error messages is to hover over it with the mouse and do a screen capture.
Where should you use the Derived Column Transformation?
It is best used when you have to perform simple operations to change data values – for example TRIMming strings, simple IF statements and SUBSTRINGs. It also reproduces all of the functionality in the Audit and Data Conversion tasks, so if you are using those anywhere you may want to consider replacing them with a Derived Column. If your expressions are getting complicated or you repeat alot of operations, you may want to move these operations to the uglier but more powerful Script Component, which I will be covering soon.
If you are still struggling, try these additional resources:
- Andy Leonard’s post on SSIS Expression Language and the Derived Column Transformation
If you need specific help or advice, or have suggestions on the post, please leave a comment and I will do my best to help you.