The Pivot Transformation

In this post I will be covering the Pivot Transformation. Due to the lack of a GUI or wizard for this component, it’s difficult to configure properly and painful to use, so I will try to walk through it as thoroughly as possible. I found the MSDN documentation on this component a bit confusing as well, so hopefully this will prove to be clearer.

The sample package can be found here for 2005 and guidelines on use are here.

What does the Pivot Transformation do?

This bit is fairly straightforward – it takes row values and converts them to columns, moving – but not aggregating -  a value as it does so. So if your data looked like this:

b

Fig 1: Pre Pivot Data

Pivoting on “Food Type” using “Cost” as a value would turn it into this:

b

Fig 2: Post Pivot Data

How to configure the Pivot Transformation

In the sample package Data Flow 1 I start with a query which delivers Line Totals and Order Quantities summarised by Product and Order Year. What I want to do is Pivot so I have Order Year on the rows, the values from Product Category as the Columns and the sum of Line Totals as the Value, as laid out below:

b

Fig 3: The Desired Pivot Results

Step 1: Configure the Inputs

The first step is to open the Pivot Transformation and on the Input Columns tab select the columns that are going to be used in the Pivot. Once that is done, switch to the Input and Output Properties tab. Here we are going to set how each column is used in the Pivot operation. This is done by setting the PivotUsage property of the Input Columns. This can be found by expanding Pivot Default Input > Input Columns and selecting the relevant Input Column. Down at the bottom of the list under the Custom header is the property PivotUsage. This can be set to one of four values:

  • 0 – the column is passed through unaffected
  • 1 – the column values become the rows of the pivot (aka the Set Key)
  • 2 – the column values become the column names of the pivot (aka the Pivot Column)
  • 3 - the column values that are pivoted in the pivot

So, in my example, I set OrderYear to 1, Product to 2 and Line Totals to 3.

b

Fig 4: Configuring the Pivot Transformation Inputs

Now something you have to configure upstream from this is that whatever column will be the rows of the pivot (PivotKeyUsage = 1) needs to be sorted for the component to work properly. The Pivot component iteslf doesn’t force this (as it does in a Merge Join) so you need to go back and make sure that your column is sorted by the time it hits the pivot. Setting the IsSorted and SortKeyPosition properties of the Set Key column upstream will make no difference as the Pivot Transformation won’t actually pay any attention to these settings, but it may cause unexpected results if the data itself isn’t sorted. You can see how this affects the output by running Data Flow 1a, which is the same as Data Flow 1 but not sorted properly.

Step 2: Configure the Outputs

Despite everything you’ve done so far, this component will have exactly zero outputs – each has to be manually defined. So you will need an output for your Rows (In my example, OrderYear) and an output for each of your Pivoted column’s values that you want to output. So in my example I will need to create an output for each Product Category that is in the source data – you cannot ignore a value as the component will return an error. For example, if I forgot to add a column for Components, I would get the error “The pivot key value “Components” is not valid” when I tried to execute.

To create an output, on the Inputs and Outputs tab expand the Pivot Default Output > OutputColumns and start clicking the Add Column button until you have enough columns. Then for each column you need to set the following properties:

  • Name – the name for the output column – so if you were pivoting a value of Bike, you could rename it to Bicycle in the output
  • PivotKeyValue – the value in the pivoted column that will go into this output
  • SourceColumn – the Lineage ID of the column that is to be used in the pivot. What this means is you need to go back to your input columns, and get the value of the LineageID property of the column that you set the PivotUsage of to 3 for pivoted columns, and the LineageID of the Column you set to PivotUsage of 1 for the Row Column.
b

Fig 5: Configuring the Pivot Transformation Outputs

Having fun yet? Unlikely, but fortunately you are done.

Multiple Pivots and the T-SQL Alternative

In the sample package Data Flow 2, I demonstrate a configuration of the Pivot Transformation that actually pivots two data values. This is achieved by adding a duplicate set of columns for the Pivot Column to receive the pivoted values in, and can be achieved mostly by duplicating the actions in Step 2 to create a new set of Pivot Columns.

I also demonstrate the Pivot T-SQL command in Data Flow 3 – this generates the same output as in Data Flow 1 but the aggregation and pivoting is done in a single operation with none of the awkward configuration required by the Pivot Transformation in SSIS. I won’t be going into the syntax for these queries here but lkinks to the MSDN documentation are at the end of this post.

Where to use the Pivot Transformation?

Because of the pain involved in configuring it, I would suggest you only use it when you absolutely have to and can’t push the operation back to the source. This is not least because the pivot – unlike Excel Pivot Tables or a T-SQL pivot – cannot aggregate, it needs aggregation first on the pivoted column. In my example, if I didn’t have an aggregate operation in the query, I would get the error “Duplicate pivot key value “Bikes”” as it tried to pivot another row which had a Product Category of “Bike”. The T-SQL Pivot query in SQL Server is far more powerful with its variety of aggregation options – even though it will initially be a bit awkward to write those queries to a non-expert. Additionally using T-SQL will allow you to lock down the number of columns – if you get a new value in the Pivot Column it won’t kill your package.

All in all the Pivot Transformation is badly implemented from a UI perspective, and weak from a data processing viewpoint due to it’s lack of aggregation capability.

MSDN Documentation for the Pivot Transformation can be found here for 2008 and here for 2005.

MSDN Documentation for the Pivot T-SQL command can be found here for 2008 and here for 2005.

If you’re still struggling, try looking at these other posts:

About BI Monkey

Comments

32 Responses to “The Pivot Transformation”
  1. Paul says:

    BI Monkey, I am enjoying your blog … good work, and please keep it going as it is a good resource and very informative on some of the finer points of development. Question for you though, have you had any experience with the slowly changing dimension component within SSIS. I am having all sorts of trouble when processing over 500,000 records as this object is very slow. Are there any tricks you know of to speed it up?

  2. BI Monkey says:

    Hi Paul, glad you are finding it useful.

    The Slowly Changing Dimension component is inherently very slow because of the way it works. Each row that comes in is checked against the reference table one at a time by a database query (you can watch this happening in SQL Profiler). This is quite quick as an individual operation but as you are finding, anything over about 10,000 records becomes unacceptably slow.

    The only real tuning you can do to the SCD component is to make sure your reference table is indexed on the business keys the SCD component is using to identify unique rows.

    The alternative is to use a custom component which caches the reference table in memory – there is one available at SQL BI called TableDifference (http://www.sqlbi.com/Projects/TableDifference/tabid/74/language/en-US/Default.aspx) – which in my testing proved to be very very quick indeed. I’m hoping that the SCD component will be able to cache the reference table in a future version.

  3. I have also tested TableDifference in the past, and being a very fast component compared to SCD algorithm, as a consultant, I must warn my customers that’s an opensource (freeware) component not endorsed by MS, and in general, they tend to “pay” the extra time SCD takes, instead of installing a non MS component in a production environment…

    By the way, congratulations for your blog, very useful, straightforward, clean…

    Michael

  4. BI Monkey says:

    Micheal, thanks.

    I think a reluctance to use custom components is a bit shortsighted – I think one of the real strengths of SSIS is that you can extend its capabilites in this way. At a prior consultancy we employed CozyRoc (www.cozyroc.com) to build us a component which saved huge amounts of development time and actually reduced risks due to cutting out potential for developer error. I can easily envisage “component packs” suited to certain tasks – a lot of DW processing is quite generic and could easily be consolidated into a handful of tasks.

  5. Scott Daves says:

    So my PivotUsage type 3 field is a varchar(1500) – that makes each of my output columns the same. It will not let me change the field type in the control even though the drop down is enabled, it gives me an error every time. Is it possible to change datatypes? Otherwise, I’m going to have to do a Data Copy component to shorten most of these fields and some derived columns to turn some of these into numeric values.

  6. BI Monkey says:

    Scott – you are right, you cannot change the datatype of a column as it passes through the Pivot component – if you see the text of the error you get when you try to change data types it is “Error at 1 > Pivot Example [Pivot [667]]: The component “Pivot” (667) does not allow setting output column datatype properties.” So it’s a bit boneheaded that it’s an option you can (try to) change.

    So yes – you’ll have to change the the column data type before or after using a data conversion.

  7. Steph says:

    This is an interesting comment. What I am trying to do is with one field, the field creates duplicate rows for each employee when they have multiple codes…can I use the Pivot transformation to put these codes into one row, so that I have one row per employee and each of the codes on the same line too? (They really want them in the same field, but I do not see that possibility).

  8. BI Monkey says:

    Steph, that is what the component does – where are you having trouble? It would help if you illustrate with a sample of your source data and the expected result.

  9. Miscellaneous Caver says:

    Yes, delivering this kind of quality in an actual paid product way past beta is nothing short of embarrassing.

    But it is better than your blog may suggest: Unmapped values in the key do not need to kill your package – there’s an ErrorRowDisposition available on the *input* column *after* setting PivotUsage. As SSIS BIDS is too braindead to size dialogs appropriately you’re bound to miss it. Setting this to RD_IgnoreFailure works; RD_RedirectRow is IMO badly broken (at least in 2005) – everyting gets redirected no matter what, and the default output will contain all rows but the pivoted columns will be NULL.

    What the transformation *does* is actually acceptable (I can forgive the redirect row bug, the refusal to move text or blob data, and a few others). I do not miss an aggregation capability – after all, pivoting in T-SQL most of the time you end up needlessly spending an aggregation on stuff you know will have exactly one non-NULL input. This component forces you to explicitly plan any needed aggregations which is not necessarily a bad thing. The UI, on the other hand, is as evil as it is because they didn’t write any! And we already know the generic editor (advanced being a misnomer) is so ugly because it is meant to be absolutely generic – notwithstanding that even considering this it is still needlessly painful.

    What I can’t forgive is that they gave us bananaware without fixing it for all of 1 RTM version and 5 service packs later.

  10. BI Monkey says:

    Agreed, I perhaps may have been a little tough, but as this post is one of the consistently most viewed on this blog, I can only assume the great majority of users remain utterly baffled by it.

    The SSIS Dev Team is supposedly putting more focus on the UI for the next release, so we may hope to see improvements in the future…

  11. Tonya says:

    I’m trying to pivot a data set with several PivotKeyValues (18) and had everything working when testing with one value and several outputs – 6 columns for different output fields based on one PivotKeyValue. When I add a new output for the same fields based on a different value of the PivotKeyValue I get this error: Error at Data Flow Task [Pivot[101]]: More than one output column (such as “disp_strt_dte_A” (3047)) is mapped to input column “disp_strt_dte” (952).

    Any ideas for getting it to work or do I need to scrap this and start over using T-SQL?

    Thanks for your help!

  12. Miscellaneous Caver says:

    Correction: ErrorRowDisposition = RD_RedirectRow seems to work under specific conditions.

    As stated, in a normal scenario with a working pivot that maps input values to output columns, and that uses RD_IgnoreFailure to skip over existing keys with missing output column assignment and/or duplicate keys per row, as soon as you change RD_IgnoreFailure to RD_RedirectRow, all those mapped columns will become NULL, and the error output will contain all rows.

    If, however, you have a pivot which only passes through fields of PivotUsage 0 or 1, and have *only* boolean tests on the pivot key (SourceColumn=-1), then switching to RD_RedirectRow seems to actually work.

    Interesting how something so far detached from the world of emotions can generate so much hate…

  13. Nadeem Ahmed says:

    BI Monkey, a very straightforward example and exactly what i was looking for.

    Nadeem
    blog.sairanadeem.com

  14. Yvonne Oak says:

    Hi,
    How would you take the example “Fig 2: Post Pivot Data” and turn it into “Fig 1: Pre Pivot Data”? Would you use the Pivot or the Unpivot. Can you do that example?
    thanks,
    Yvonne

  15. BI Monkey says:

    Hi Yvonne – you would use the Unpivot – see my post on the Unpivot transform for the example you asked for

  16. Jaanu says:

    I have a table with 16 columns and its a parent child related table.
    I have 1700 rows in the table and it is unpivoted and i need to pivot the table and flatten the hieraechy i.e from node to leaf
    except for the ID column all other columns have repeating values.
    can you suggest me how to use pivot transformation for this.

  17. BI Monkey says:

    Jaanu, can you provide a sample of source data and target data?

  18. Ronaldo says:

    Dear BI Monkey,

    I have some problems with the pivot transformation, it keeps giving me error message of “Output column “Region 1″ (549) cannot be mapped to PivotKey input column”
    What does it mean?
    My input column are :
    - Week –> pivot usage 1
    - Region –> pivot usage 2
    - Sales –> pivot usage 3

    My Output column are :
    - Week –> link to input column “Week”
    - Region 1 –> link to input column “Region”, with PivotKeyValue = Region 1
    - Region 2 –> link to input column “Region”, with PivotKeyValue = Region 2
    - Region 3 –> link to input column “Region”, with PivotKeyValue = Region 3
    - Region 4 –> link to input column “Region”, with PivotKeyValue = Region 4

    What is the mistake of that setting?
    Please kindly help
    Thank you

    Regards,

    Ronaldo

  19. BI Monkey says:

    I can’t see what you’re doing wrong from what you’ve quoted. Feel free to email me the package and I’ll take a look.

  20. Elmozamil Elamir says:

    Hi,
    Thanks you for your efforts and time to write this paper.
    I use SSIS to transform data from Adventureworks2008 database and return date and quantity, after I configure it as I read it return an error that tell me (Duplicate pivot key value “9″) Sometimes for 12 and so I googled but I didn’t ge any help please your help.

  21. Srinivas says:

    Hi BI Monkey,

    simple pivoting is fine and I follow BI Monkey, but now the requirement is like Food Type rows are changeing dynamically,

    FoodFor FoodType Cost
    BI Monkey Peanuts 10
    BI Monkey Cashews 20
    BI Monkey Coffee 15

    The above one is simple one and can do it.

    but now it has increased like below:

    FoodFor FoodType Cost
    BI Monkey Peanuts 10
    BI Monkey Cashews 20
    BI Monkey Coffee 15
    BI Monkey Juice 25 New record
    BI Monkey Coke 12 New record

    Food Type is changing Dynamically and how to handle this scenario in SSIS using Pivot is biggest Challenge.

    Please can any one help me out from this, the requirement is very urgent…

    thanks in Advance

  22. BI Monkey says:

    Hi Srinivas

    Unfortunately the Pivot component cannot handle changing keys – there’s no way to dynamically allow for new Pivot Keys.

    The only workaround I can think of is to use surrogate keys for your values, so you pivot on values 1 to 20 (or however many you think you will need). That way you can set up the component for a bigger range of values than you currently have without having to worry about what those values actually are.

  23. Steve Francis says:

    I found this example while playing with SQL 2012 CTP 3 and the interface is still rubbish. I still don’t understand why a GUI would require the user to look up a numeric code on the internet to specify column usage. I also fail to see why information in one part of the interface can’t be presented in another. Altogether the GUI interface is far too generic – just a way to type in raw parameters to an API. I think MS have forgotten the lessons from PARC (Where the GUI was invented).

  24. BI Monkey says:

    Steve – one of the features coming in SQL2012 proper should be a decent UI for the Pivot Transformation – I saw it demo’d at PASS and it does look much better.

  25. Gabeinn says:

    Thanks a lot was really useful.

    Regards,

    Gabe

  26. bhargav says:

    Thanks but for some reason i’m not able to see the custom properties when i click on the INPUT Columns, any advice

  27. Stefan says:

    I use the pivot transform in SSIS.

    As with the nature of my data, some of the numeric values that gets pivoted, results in NULL values.

    How do I go about in SSIS, to remove the NULL values and replace them with 0 instead?

    Is there a property in the pivot transform I can use, or should I test all results for NULL in a derived column control?

  28. BI Monkey says:

    You’ll have to fix it up post pivot using a derived column, as you suspected. I’m not aware of any property you can set.

Trackbacks

Check out what others are saying about this post...
  1. [...] this post I will be covering the Unpivot Transformation, the normalising sister component of the Pivot Transformation. Fortunately it is much easier to use. The sample package can be found here for 2005 and [...]

  2. Protean says:

    SSIS Pivot Component Example…

    It may be me, but as yet I haven’t seen too much written about the SSIS pivot and unpivot components.
    Whilst they are not the easiest to configure, when faced with the option of importing to SQL Server and using a UNION or the new PIVOT clauses, …



Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!