The Pivot Transformation

In this post I will be covering the Pivot Transformation. Due to the lack of a GUI or wizard for this component, it’s difficult to configure properly and painful to use, so I will try to walk through it as thoroughly as possible. I found the MSDN documentation on this component a bit confusing as well, so hopefully this will prove to be clearer.

The sample package can be found here for 2005 and guidelines on use are here.

What does the Pivot Transformation do?

This bit is fairly straightforward – it takes row values and converts them to columns, moving – but not aggregating -  a value as it does so. So if your data looked like this:

b

Fig 1: Pre Pivot Data

Pivoting on “Food Type” using “Cost” as a value would turn it into this:

b

Fig 2: Post Pivot Data

How to configure the Pivot Transformation

In the sample package Data Flow 1 I start with a query which delivers Line Totals and Order Quantities summarised by Product and Order Year. What I want to do is Pivot so I have Order Year on the rows, the values from Product Category as the Columns and the sum of Line Totals as the Value, as laid out below:

b

Fig 3: The Desired Pivot Results

Step 1: Configure the Inputs

The first step is to open the Pivot Transformation and on the Input Columns tab select the columns that are going to be used in the Pivot. Once that is done, switch to the Input and Output Properties tab. Here we are going to set how each column is used in the Pivot operation. This is done by setting the PivotUsage property of the Input Columns. This can be found by expanding Pivot Default Input > Input Columns and selecting the relevant Input Column. Down at the bottom of the list under the Custom header is the property PivotUsage. This can be set to one of four values:

  • 0 – the column is passed through unaffected
  • 1 – the column values become the rows of the pivot (aka the Set Key)
  • 2 – the column values become the column names of the pivot (aka the Pivot Column)
  • 3 - the column values that are pivoted in the pivot

So, in my example, I set OrderYear to 1, Product to 2 and Line Totals to 3.

b

Fig 4: Configuring the Pivot Transformation Inputs

Now something you have to configure upstream from this is that whatever column will be the rows of the pivot (PivotKeyUsage = 1) needs to be sorted for the component to work properly. The Pivot component iteslf doesn’t force this (as it does in a Merge Join) so you need to go back and make sure that your column is sorted by the time it hits the pivot. Setting the IsSorted and SortKeyPosition properties of the Set Key column upstream will make no difference as the Pivot Transformation won’t actually pay any attention to these settings, but it may cause unexpected results if the data itself isn’t sorted. You can see how this affects the output by running Data Flow 1a, which is the same as Data Flow 1 but not sorted properly.

Step 2: Configure the Outputs

Despite everything you’ve done so far, this component will have exactly zero outputs – each has to be manually defined. So you will need an output for your Rows (In my example, OrderYear) and an output for each of your Pivoted column’s values that you want to output. So in my example I will need to create an output for each Product Category that is in the source data – you cannot ignore a value as the component will return an error. For example, if I forgot to add a column for Components, I would get the error “The pivot key value “Components” is not valid” when I tried to execute.

To create an output, on the Inputs and Outputs tab expand the Pivot Default Output > OutputColumns and start clicking the Add Column button until you have enough columns. Then for each column you need to set the following properties:

  • Name – the name for the output column – so if you were pivoting a value of Bike, you could rename it to Bicycle in the output
  • PivotKeyValue – the value in the pivoted column that will go into this output
  • SourceColumn – the Lineage ID of the column that is to be used in the pivot. What this means is you need to go back to your input columns, and get the value of the LineageID property of the column that you set the PivotUsage of to 3 for pivoted columns, and the LineageID of the Column you set to PivotUsage of 1 for the Row Column.
b

Fig 5: Configuring the Pivot Transformation Outputs

Having fun yet? Unlikely, but fortunately you are done.

Multiple Pivots and the T-SQL Alternative

In the sample package Data Flow 2, I demonstrate a configuration of the Pivot Transformation that actually pivots two data values. This is achieved by adding a duplicate set of columns for the Pivot Column to receive the pivoted values in, and can be achieved mostly by duplicating the actions in Step 2 to create a new set of Pivot Columns.

I also demonstrate the Pivot T-SQL command in Data Flow 3 – this generates the same output as in Data Flow 1 but the aggregation and pivoting is done in a single operation with none of the awkward configuration required by the Pivot Transformation in SSIS. I won’t be going into the syntax for these queries here but lkinks to the MSDN documentation are at the end of this post.

Where to use the Pivot Transformation?

Because of the pain involved in configuring it, I would suggest you only use it when you absolutely have to and can’t push the operation back to the source. This is not least because the pivot – unlike Excel Pivot Tables or a T-SQL pivot – cannot aggregate, it needs aggregation first on the pivoted column. In my example, if I didn’t have an aggregate operation in the query, I would get the error “Duplicate pivot key value “Bikes”” as it tried to pivot another row which had a Product Category of “Bike”. The T-SQL Pivot query in SQL Server is far more powerful with its variety of aggregation options – even though it will initially be a bit awkward to write those queries to a non-expert. Additionally using T-SQL will allow you to lock down the number of columns – if you get a new value in the Pivot Column it won’t kill your package.

All in all the Pivot Transformation is badly implemented from a UI perspective, and weak from a data processing viewpoint due to it’s lack of aggregation capability.

MSDN Documentation for the Pivot Transformation can be found here for 2008 and here for 2005.

MSDN Documentation for the Pivot T-SQL command can be found here for 2008 and here for 2005.

If you’re still struggling, try looking at these other posts:

Comments

15 Responses to “The Pivot Transformation”
  1. Paul says:

    BI Monkey, I am enjoying your blog … good work, and please keep it going as it is a good resource and very informative on some of the finer points of development. Question for you though, have you had any experience with the slowly changing dimension component within SSIS. I am having all sorts of trouble when processing over 500,000 records as this object is very slow. Are there any tricks you know of to speed it up?

  2. BI Monkey says:

    Hi Paul, glad you are finding it useful.

    The Slowly Changing Dimension component is inherently very slow because of the way it works. Each row that comes in is checked against the reference table one at a time by a database query (you can watch this happening in SQL Profiler). This is quite quick as an individual operation but as you are finding, anything over about 10,000 records becomes unacceptably slow.

    The only real tuning you can do to the SCD component is to make sure your reference table is indexed on the business keys the SCD component is using to identify unique rows.

    The alternative is to use a custom component which caches the reference table in memory – there is one available at SQL BI called TableDifference (http://www.sqlbi.com/Projects/TableDifference/tabid/74/language/en-US/Default.aspx) – which in my testing proved to be very very quick indeed. I’m hoping that the SCD component will be able to cache the reference table in a future version.

  3. I have also tested TableDifference in the past, and being a very fast component compared to SCD algorithm, as a consultant, I must warn my customers that’s an opensource (freeware) component not endorsed by MS, and in general, they tend to “pay” the extra time SCD takes, instead of installing a non MS component in a production environment…

    By the way, congratulations for your blog, very useful, straightforward, clean…

    Michael

  4. BI Monkey says:

    Micheal, thanks.

    I think a reluctance to use custom components is a bit shortsighted – I think one of the real strengths of SSIS is that you can extend its capabilites in this way. At a prior consultancy we employed CozyRoc (www.cozyroc.com) to build us a component which saved huge amounts of development time and actually reduced risks due to cutting out potential for developer error. I can easily envisage “component packs” suited to certain tasks – a lot of DW processing is quite generic and could easily be consolidated into a handful of tasks.

  5. Scott Daves says:

    So my PivotUsage type 3 field is a varchar(1500) – that makes each of my output columns the same. It will not let me change the field type in the control even though the drop down is enabled, it gives me an error every time. Is it possible to change datatypes? Otherwise, I’m going to have to do a Data Copy component to shorten most of these fields and some derived columns to turn some of these into numeric values.

  6. BI Monkey says:

    Scott – you are right, you cannot change the datatype of a column as it passes through the Pivot component – if you see the text of the error you get when you try to change data types it is “Error at 1 > Pivot Example [Pivot [667]]: The component “Pivot” (667) does not allow setting output column datatype properties.” So it’s a bit boneheaded that it’s an option you can (try to) change.

    So yes – you’ll have to change the the column data type before or after using a data conversion.

  7. Steph says:

    This is an interesting comment. What I am trying to do is with one field, the field creates duplicate rows for each employee when they have multiple codes…can I use the Pivot transformation to put these codes into one row, so that I have one row per employee and each of the codes on the same line too? (They really want them in the same field, but I do not see that possibility).

  8. BI Monkey says:

    Steph, that is what the component does – where are you having trouble? It would help if you illustrate with a sample of your source data and the expected result.

  9. Miscellaneous Caver says:

    Yes, delivering this kind of quality in an actual paid product way past beta is nothing short of embarrassing.

    But it is better than your blog may suggest: Unmapped values in the key do not need to kill your package – there’s an ErrorRowDisposition available on the *input* column *after* setting PivotUsage. As SSIS BIDS is too braindead to size dialogs appropriately you’re bound to miss it. Setting this to RD_IgnoreFailure works; RD_RedirectRow is IMO badly broken (at least in 2005) – everyting gets redirected no matter what, and the default output will contain all rows but the pivoted columns will be NULL.

    What the transformation *does* is actually acceptable (I can forgive the redirect row bug, the refusal to move text or blob data, and a few others). I do not miss an aggregation capability – after all, pivoting in T-SQL most of the time you end up needlessly spending an aggregation on stuff you know will have exactly one non-NULL input. This component forces you to explicitly plan any needed aggregations which is not necessarily a bad thing. The UI, on the other hand, is as evil as it is because they didn’t write any! And we already know the generic editor (advanced being a misnomer) is so ugly because it is meant to be absolutely generic – notwithstanding that even considering this it is still needlessly painful.

    What I can’t forgive is that they gave us bananaware without fixing it for all of 1 RTM version and 5 service packs later.

  10. BI Monkey says:

    Agreed, I perhaps may have been a little tough, but as this post is one of the consistently most viewed on this blog, I can only assume the great majority of users remain utterly baffled by it.

    The SSIS Dev Team is supposedly putting more focus on the UI for the next release, so we may hope to see improvements in the future…

  11. Tonya says:

    I’m trying to pivot a data set with several PivotKeyValues (18) and had everything working when testing with one value and several outputs – 6 columns for different output fields based on one PivotKeyValue. When I add a new output for the same fields based on a different value of the PivotKeyValue I get this error: Error at Data Flow Task [Pivot[101]]: More than one output column (such as “disp_strt_dte_A” (3047)) is mapped to input column “disp_strt_dte” (952).

    Any ideas for getting it to work or do I need to scrap this and start over using T-SQL?

    Thanks for your help!

  12. Miscellaneous Caver says:

    Correction: ErrorRowDisposition = RD_RedirectRow seems to work under specific conditions.

    As stated, in a normal scenario with a working pivot that maps input values to output columns, and that uses RD_IgnoreFailure to skip over existing keys with missing output column assignment and/or duplicate keys per row, as soon as you change RD_IgnoreFailure to RD_RedirectRow, all those mapped columns will become NULL, and the error output will contain all rows.

    If, however, you have a pivot which only passes through fields of PivotUsage 0 or 1, and have *only* boolean tests on the pivot key (SourceColumn=-1), then switching to RD_RedirectRow seems to actually work.

    Interesting how something so far detached from the world of emotions can generate so much hate…

  13. Nadeem Ahmed says:

    BI Monkey, a very straightforward example and exactly what i was looking for.

    Nadeem
    blog.sairanadeem.com

Trackbacks

Check out what others are saying about this post...
  1. [...] this post I will be covering the Unpivot Transformation, the normalising sister component of the Pivot Transformation. Fortunately it is much easier to use. The sample package can be found here for 2005 and [...]

  2. Protean says:

    SSIS Pivot Component Example…

    It may be me, but as yet I haven’t seen too much written about the SSIS pivot and unpivot components.
    Whilst they are not the easiest to configure, when faced with the option of importing to SQL Server and using a UNION or the new PIVOT clauses, …



Comments

Please leave a comment or ask the BI Monkey for more info.
To prevent spamming, comments have to be approved so they will not appear instantly.
If you want a pic to show with your comment, go get a gravatar