The Script Transformation part 1 – a simple Transformation

Fig 1: The Script Transformation

Fig 1: The Script Transformation

In this post I will be covering the Script Transformation. The sample package can be found here for 2005 and guidelines on use are here.

What does the Script Transformation do?

The question here should really be what can’t it do? The Script Transform fills in the gap when standard components don’t fit the bill by allowing you to create, consume or access rows and columns in the Data Flow and process them within VB.Net code (and also C# in 2008). This opens up a whole array of functionality to access as both VB.Net and C# are powerful and flexible languages. The Script Transformation can function as a Source (providing rows to the Data Flow), as a Destination (consuming rows from the Data Flow) or as a Transformation (changing or creating column values in the Data Flow).

Functioning as a Transformation, you aren’t restricted to just row in, row out operations either – you add new outputs, create multiple rows from single rows and create single rows from multiple rows. The capacity to do impressive tricks with your data is – well, impressive! Because of this array of options I will break each one out into a separate post. This first post will cover a simple one row in – one row out transformation.

Sadly there are some downsides to all this, two of which really stand out for me. The first of these is if you aren’t a programmer (I never moved much beyond VBA) then writing the code, debugging the code, or even knowing what can and can’t be done in the code can make working with this component a bit of a struggle. The second is you lose a lot of visibility over what is being done in the component – unlike with most other transforms there is no nice GUI to show what column is going where and what is being done to it. You need to be able to actually read the code to understand what is going on, and I must warn that the BI Monkey becomes one Angry Ape when code is insufficiently commented!

Configuring the Script Transformation

It is fairly easy to set up the Script Transformation, but you need to use a little more of the Advanced Editor type features than basic developers are probably used to. Key actions are selecting input columns, defining output columns, choosing input variables and connection managers being used within the component.

First up simply check the columns from the Input that you want to access in the script component. By “access” I mean read or alter the value of.

b

Fig 2: Selecting the Input Columns

Second, define the output columns. If you are adding new columns to the Data Flow as I do in the example, click the “Add Column” button which becomes enabled when you select the “Output Columns” folder. Then name it and select the data type. By default when using a Script Transformation as a Transformation a single output Output 0 is created for you to add columns to.

b

Fig 3: Configuring the Output Columns

Finally (in this case – I won’t be using any connection managers here and will cover those in a future post) enter the variables you want to be able to access in the script – there are two options – ReadOnlyVariables and ReadWriteVariables. Fairly self explanatory – if you want to change the value of the variable in the script enter it into the ReadWriteVariables line – and if you don’t want the value to change enter it into the ReadOnlyVariables line. Two quick notes of warning – if there’s a space in your list of variables in the ReadOnlyVariables line it will cause an error in 2005 Script tasks. Secondly, remember variable names are case sensitive.

b

Fig 4: Specifying the Variables

The last thing you need to do is click on the Design Script button that will open your code editor. There is a commented section where you can add your code.

Below is my sample code from the package – there’s nothing too fancy going on here. Note how the columns selected in the Inputs are available as a property of the Row - they simply pop up on Intellisense as you code. Similarly variables are accessible from Me.Variables.

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

‘ Use a standard VB Function
Row.ShipYear = Row.ShipDate.Year

‘ Use a variable in the Me.Variables.<VariableName> format
Row.DoubleDue = Row.TotalDue * CType(Me.Variables.Multiplier, Decimal)

End Sub

I’ll dig into how code is structured in one of the future posts in this mini-series.

Where should you use the Script Transformation?

The Script transformation is there to be used when native SSIS component functionality doesn’t meet your needs – the most common use I have had for it is when the Derived Column editor doesn’t give me what I need – such as when I want to use Regular Expressions, or reuse a function across many columns.

MSDN Documentation for the Script Transformation can be found here for 2008 and here for 2005.

If you are still struggling, try these additional resources:

If you need specific help or advice, or have suggestions on the post, please leave a comment and I will do my best to help you.

About BI Monkey

Comments

8 Responses to “The Script Transformation part 1 – a simple Transformation”
  1. Amarjot says:

    Hi,

    Thank you for the article. I want to know if there is a way we can add input columns to the script transformation task. I am working with flat files which have variable columns ranging from 50 to 250. All these columns have same data type.

    I was trying to use Unpivot task but then again I have to check all the columns manually.

    I will truly appreciate any help or thoughts on this.

    Thanks
    Amarjot

  2. BI Monkey says:

    Hi Amarjot… if I understand correctly you are asking if it is possible to dynamically set the number outputs on a script transform. The answer to that is no – the number of inputs / outputs is fixed at design time and cannot be changed during runtime.

    If you want to handle a flat file that has a varying number of columns, your best bet is to open that file in a script task that has enough outputs to handle all possible outcomes, and use the script to populate the output columns needed.

  3. Kate says:

    I’m hoping you can help – I have a script task to accomplish exporting a varying number of columns to a flat file. The number of columns are returned correctly, but the data in the rows is not complete. In most cases, the last row is truncated somewhere in one of the fields but the location isn’t always the same field. In other cases, the file ends prior to the last row from my data table so the last rows are missing and the last row that does display is truncated. I can’t find any reason or solution. Any thoughts?

  4. BI Monkey says:

    That sounds very odd indeed, and not behaviour i’ve ever seen. I hate to say it, but my first suspect would be the script. Some other common sense checks – is the data definitely getting into the component? Are the data sizes of the fields correct?

  5. Peter says:

    Hi,

    I hope you could give me some ideas.
    I have some experience with SSIS, but never done anything like that.

    I was tasked with generating a non-standard excel export (via SSIS), something that looks more like a custom made access (or crystal) report.

    First, it does not have a standard header – instead it has three header rows with some cells being blank. Most of the data in this header is static, but some will be dynamic.

    Second, the actual data output would consist of 2 lines (excel rows) per each data record. The problem is that the first and the second row (of each pair) would contain different data and be formatted differently (currency vs. numeric).

    So, it looks like I would need to generate each output row and format its cells on the fly.
    Is SSIS Data Flow Transformation Script Component the right tool for this task? Could it be used to format each cell individually?

    Thanks,
    Peter

  6. BI Monkey says:

    Hi Peter

    You’re going to need a mix of components to achieve this result, though I’m not sure SSIS is the best tool for the job – SSRS is much more suited to this.

    If you absolutely must use SSIS, you will have to :
    a) Use a script component in the Control Flow to create your spreadsheet and populate the header
    b) use the script component in the Data Flow to populate (but not format) the alternating rows of data
    c) use another Script component in the Control Flow to apply formatting

    That’s pretty ugly – SSIS would not be my tool of choice here.

  7. Phil says:

    Hi,

    I was wondering if there was a way to reference the output columns through an array and not the variable names?

    I need to assign values to output columns based on certain conditions that are met. This is all within a loop.

    if (whatfield == 0)
    {
    Row.ColumnA = “”;
    whatfield++;
    }
    else if (whatfield == 1)
    {
    Row.ColumnB = “”;
    whatfield++;
    }
    else …

    As you can see this is pain staking to do for every field that comes in and can not be used in other projects without changing the names for the new project.

    Is this even possible?

    Thanks

    Phil

    I would rather do Row.ColumnArray[value] = “”; and just loop through the output columns.

  8. BI Monkey says:

    Off the top of my head, no – this may be easier in C# but I’m not a great coder so I wouldn’t take my word for it.

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!