The Script Transformation part 1 – a simple Transformation

Fig 1: The Script Transformation
Fig 1: The Script Transformation

In this post I will be covering the Script Transformation. The sample package can be found here for 2005 and guidelines on use are here.

What does the Script Transformation do?

The question here should really be what can’t it do? The Script Transform fills in the gap when standard components don’t fit the bill by allowing you to create, consume or access rows and columns in the Data Flow and process them within VB.Net code (and also C# in 2008). This opens up a whole array of functionality to access as both VB.Net and C# are powerful and flexible languages. The Script Transformation can function as a Source (providing rows to the Data Flow), as a Destination (consuming rows from the Data Flow) or as a Transformation (changing or creating column values in the Data Flow).

Functioning as a Transformation, you aren’t restricted to just row in, row out operations either – you add new outputs, create multiple rows from single rows and create single rows from multiple rows. The capacity to do impressive tricks with your data is – well, impressive! Because of this array of options I will break each one out into a separate post. This first post will cover a simple one row in – one row out transformation.

Sadly there are some downsides to all this, two of which really stand out for me. The first of these is if you aren’t a programmer (I never moved much beyond VBA) then writing the code, debugging the code, or even knowing what can and can’t be done in the code can make working with this component a bit of a struggle. The second is you lose a lot of visibility over what is being done in the component – unlike with most other transforms there is no nice GUI to show what column is going where and what is being done to it. You need to be able to actually read the code to understand what is going on, and I must warn that the BI Monkey becomes one Angry Ape when code is insufficiently commented!

Configuring the Script Transformation

It is fairly easy to set up the Script Transformation, but you need to use a little more of the Advanced Editor type features than basic developers are probably used to. Key actions are selecting input columns, defining output columns, choosing input variables and connection managers being used within the component.

First up simply check the columns from the Input that you want to access in the script component. By “access” I mean read or alter the value of.

b
Fig 2: Selecting the Input Columns

Second, define the output columns. If you are adding new columns to the Data Flow as I do in the example, click the “Add Column” button which becomes enabled when you select the “Output Columns” folder. Then name it and select the data type. By default when using a Script Transformation as a Transformation a single output Output 0 is created for you to add columns to.

b
Fig 3: Configuring the Output Columns

Finally (in this case – I won’t be using any connection managers here and will cover those in a future post) enter the variables you want to be able to access in the script – there are two options – ReadOnlyVariables and ReadWriteVariables. Fairly self explanatory – if you want to change the value of the variable in the script enter it into the ReadWriteVariables line – and if you don’t want the value to change enter it into the ReadOnlyVariables line. Two quick notes of warning – if there’s a space in your list of variables in the ReadOnlyVariables line it will cause an error in 2005 Script tasks. Secondly, remember variable names are case sensitive.

b
Fig 4: Specifying the Variables

The last thing you need to do is click on the Design Script button that will open your code editor. There is a commented section where you can add your code.

Below is my sample code from the package – there’s nothing too fancy going on here. Note how the columns selected in the Inputs are available as a property of the Row – they simply pop up on Intellisense as you code. Similarly variables are accessible from Me.Variables.

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

‘ Use a standard VB Function
Row.ShipYear = Row.ShipDate.Year

‘ Use a variable in the Me.Variables.<VariableName> format
Row.DoubleDue = Row.TotalDue * CType(Me.Variables.Multiplier, Decimal)

End Sub

I’ll dig into how code is structured in one of the future posts in this mini-series.

Where should you use the Script Transformation?

The Script transformation is there to be used when native SSIS component functionality doesn’t meet your needs – the most common use I have had for it is when the Derived Column editor doesn’t give me what I need – such as when I want to use Regular Expressions, or reuse a function across many columns.

MSDN Documentation for the Script Transformation can be found here for 2008 and here for 2005.

If you are still struggling, try these additional resources:

If you need specific help or advice, or have suggestions on the post, please leave a comment and I will do my best to help you.

17 thoughts on “The Script Transformation part 1 – a simple Transformation

  1. Hi,

    Thank you for the article. I want to know if there is a way we can add input columns to the script transformation task. I am working with flat files which have variable columns ranging from 50 to 250. All these columns have same data type.

    I was trying to use Unpivot task but then again I have to check all the columns manually.

    I will truly appreciate any help or thoughts on this.

    Thanks
    Amarjot

  2. Hi Amarjot… if I understand correctly you are asking if it is possible to dynamically set the number outputs on a script transform. The answer to that is no – the number of inputs / outputs is fixed at design time and cannot be changed during runtime.

    If you want to handle a flat file that has a varying number of columns, your best bet is to open that file in a script task that has enough outputs to handle all possible outcomes, and use the script to populate the output columns needed.

  3. I’m hoping you can help – I have a script task to accomplish exporting a varying number of columns to a flat file. The number of columns are returned correctly, but the data in the rows is not complete. In most cases, the last row is truncated somewhere in one of the fields but the location isn’t always the same field. In other cases, the file ends prior to the last row from my data table so the last rows are missing and the last row that does display is truncated. I can’t find any reason or solution. Any thoughts?

  4. That sounds very odd indeed, and not behaviour i’ve ever seen. I hate to say it, but my first suspect would be the script. Some other common sense checks – is the data definitely getting into the component? Are the data sizes of the fields correct?

  5. Hi,

    I hope you could give me some ideas.
    I have some experience with SSIS, but never done anything like that.

    I was tasked with generating a non-standard excel export (via SSIS), something that looks more like a custom made access (or crystal) report.

    First, it does not have a standard header – instead it has three header rows with some cells being blank. Most of the data in this header is static, but some will be dynamic.

    Second, the actual data output would consist of 2 lines (excel rows) per each data record. The problem is that the first and the second row (of each pair) would contain different data and be formatted differently (currency vs. numeric).

    So, it looks like I would need to generate each output row and format its cells on the fly.
    Is SSIS Data Flow Transformation Script Component the right tool for this task? Could it be used to format each cell individually?

    Thanks,
    Peter

  6. Hi Peter

    You’re going to need a mix of components to achieve this result, though I’m not sure SSIS is the best tool for the job – SSRS is much more suited to this.

    If you absolutely must use SSIS, you will have to :
    a) Use a script component in the Control Flow to create your spreadsheet and populate the header
    b) use the script component in the Data Flow to populate (but not format) the alternating rows of data
    c) use another Script component in the Control Flow to apply formatting

    That’s pretty ugly – SSIS would not be my tool of choice here.

  7. Hi,

    I was wondering if there was a way to reference the output columns through an array and not the variable names?

    I need to assign values to output columns based on certain conditions that are met. This is all within a loop.

    if (whatfield == 0)
    {
    Row.ColumnA = “”;
    whatfield++;
    }
    else if (whatfield == 1)
    {
    Row.ColumnB = “”;
    whatfield++;
    }
    else …

    As you can see this is pain staking to do for every field that comes in and can not be used in other projects without changing the names for the new project.

    Is this even possible?

    Thanks

    Phil

    I would rather do Row.ColumnArray[value] = “”; and just loop through the output columns.

  8. I have a table in one system with about 70 columns. I need to import that to another system and table and change the format of one column on the way. I was not able to reformat in the SQL query – it causes an error- so I am trying to do it in a Script Component. While I can select the input columns from a dropdown, it appears I have to recreate them one by one on the output side, including the data type. This is tedious.

    Is there a better way?

    Steve

  9. hi,

    Great posts!!
    I am really new is SSIS and a I have question.

    I have a table with balances (from bank accounts) I want to do some culculations with each balance of every customer (I have the math types for the culculations) but i found difficult to desing the Flow.

    I start with my DB and i choose the table i want and then??? how i add my math types??

    Thanks a lot!!

  10. StringBuilder SourceAccount = new StringBuilder();
    public override void PreExecute()
    {
    base.PreExecute();

    /*
    Add your code here for preprocessing or remove if not needed
    */
    }

    public override void PostExecute()
    {
    base.PostExecute();
    /*
    Add your code here for postprocessing or remove if not needed
    You can set read/write variables here, for example:
    Variables.MyIntVar = 100

    */

    Variables.NewAccountNumber = SourceAccount.ToString();
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {

    SourceAccount.Append(Row.SourceAccountNumber + ” – ” + Row.SourceAccountName + Environment.NewLine);

    /*
    Add your code here
    */
    }

    }

  11. Hi,

    I want to use a script task (in Control Flow) before Data Flow Task to count the number of rows in the file.
    Do you know the solution using C#? Can you help?

  12. You could do this using some custom code to open the file and read every line, increasing a counter as you go, then pass that out as a variable.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>