SSIS Derived Column forcing data type changes

In SQL 2005, the Derived Column Transformation used to drive me nuts because any time you changed a string value, the component automatically changed the datatype to Unicode, even if it was originally string. This meant any time you changed an Expression that returned a non-unicode string, you had to change it back.

In SQL2008 it has got worse! Now the same change of data type is forced on you, but now the only way to change the data type back is to go into the Advanced Editor and amend the data type of the Output Column there, because – apparently by design – the data type is now locked in the normal UI.

Why by design? No idea, but according to this Connect article, that’s the way it’s going to stay – please vote this up and maybe Microsoft will revisit this incredibly annoying “feature”.

Update: I’ve also discovered that even if you drag a non-unicode column from the data flow into the Expression (nothing else, just the column) it still converts it to Unicode!

12 thoughts on “SSIS Derived Column forcing data type changes

  1. Cast the value to the type you want: this forces SSIS to infer the correct type and I use it in 2005 for that reason because it saves having to change the type in the drop down.

  2. James – this is actually a good thing. Perhaps I can explain why, and how to best work with it.

    I completely understand your frustration with typing in a string literal and having the type automatically change to DT_WSTR instead of “staying” as a DT_STR – but I understand that the world is moving towards Unicode, so having it as a default is “a good thing”. So unfortunately, you’re going to have to accept that part of this issue.

    But first – the problem with SSIS 2005 that the SSIS team was attempting to “correct” was implicit type casting. What I’m talking about there is the many issues that came up where people would put an expression into the Derived Column, completely ignore the resulting type, and cast it to another type simply because they thought it should work. There’s nothing innately wrong with that – except that when they attempt to debug the problem, it’s extremely NON obvious where the “casting” error was occurring, and almost nobody ever included the information that they’d changed the type of the column when they were asking for help.

    So the team changed that behaviour so that you can’t cast results by clicking the dropdown. However, you CAN cast the results without going in to the Advanced Editor – because that would be an even more cryptic location to go! You need to cast the results using the cast operators shown in the Derived Column editor. If you want your result to be a non-Unicode string, cast it like this: (DT_STR, 1252)”Test String”. The benefits are (allegedly) many:
    1. The cast is explicit. You get what you ask for, nothing more, nothing less – no magic. C# devs love this. VB/Java devs maybe not so much…
    2. The cast is visible. You simply can’t forget that you asked for the cast operation to take place – it’s staring you in the face. This is especially important when looking at someone else’s work – the expression they give you is complete and unambiguous.

    Decide for yourself… It bothered me at first, but I prefer it to 2005 behaviour now.

  3. Thanks guys – I understand the workaround, and Todd, thanks for the explanation to the logic behind it.

    However I remain unconvinced this is a good thing – I could appreciate locking it down *once* you make your initial choice of Unicode / Non-Unicode – but by default forcing Unicode (and what really gets my goat, forcing Unicode even if the original setting was Non-Unicode when you make a change) just gets frustrating.

  4. I don’t understand what you mean by “forcing Unicode even if the original setting was non-Unicode”. You’re creating a new column, right? There isn’t an “original setting” then… and how would you “unlock” it once you’d “locked” it?

    Don’t look at it as an “initial choice” – look at it as an “explicit choice”…

  5. I find this annoying as well BI Monkey, sorry Todd. I can see what the boffins at Microsoft are trying to achieve but I have to say that if you have explicitly changed your datatype to DT_STR and then decide to change a value in your expression from “Unknown” to “Not Specified”, it is purely a simple frustration that you then have to remember to reset the data type … again.

    Its a case of Microsoft trying to presume too much … either remove the drop down list and force developers to use the cast code, or make the drop down work like a cast function and if it has been changed, retain the resulting choice … there is a reason it was changed.

    Its not a show stopper, and yes it is probably best practice to explicitly cast, but it is not cool to change a choice to a predetermined type after a user has made a choice. Just think of the outrage if users were typing a word document in Calibri and then decided to fix the spelling of a word and that one word changed to Time New Roman!

  6. @Todd, as per Paul’s comment – test for yourself – if you create a Derived Column, give it a value and set the Data Type to Non-Unicode, if you then change that value, the component forcibly changes the data type to Unicode. This is the thing that annoys me most about this (applies to 2005 & 2008)

    @Paul – nice allegory with the fonts!

  7. Silly me – I’ve never done that. I suppose once I got over the shock of not being able to change the data type after it was set in stone, I never went back to try to use that dropdown again – ever. MSFT should remove the dropdown completely – or move it to the left hand side of the expression. IMO, this won’t be an issue in the next SSIS – I believe one of the parts they’re concentrating on is UI (no proof of that). As Warren Thornthwaite of Kimball Group loves to say (sarcastically) “the next version is gonna be great” :)

  8. That’s great that the future is unicode, but right now there are plenty “legacy” (read:Operational) systems not using Unicode, and this has caused a great annoyance for ETL with those systems.

    Stupid decision by Microsoft IMHO.

  9. I am so mad at Microsoft. SSIS 2008 is a step backwards. I have a variable of type string and I am trying to use it as the value of a derived column which used to be DT_STR of size 3, but now I get errors because my variable is one type and the derived column is forced to another column. I have tried to cast the thing to DT_STR, but it will not take this cast. Why are they doing this. If I had a choice, I would start over with a better product. This stinks!!!!!!!!

  10. I agree. Bad idea. If I have billions of rows of data (which I do), I don’t want to use unicode in the database unless I have to. I takes up too much space. I understand the argument that the world is going to uinicode and that makes since if I’m in visual c++ but not in a database. Unicode is a no-no there unless you have to use it.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>