No Piggybank for Pig on Hadoop on Azure
A quick note – Pig functions from the piggybank are not available in Hadoop on Azure.
I found this out as I was trying to run some things through Pig, trying to manage some Excel CSV files that had fields with line feeds in them. I discovered there was a Pig load/store function CSVExcelStorage that would handle them, but when I tried to use it… ah. Not there. Turns out it was a piggybank function, which are a set of user contributed functions that you have to include in your pig build. The source code is freely available (being open source and all) but I haven’t worked out how in an HOA environment you can build them and use them.
I can understand why Microsoft have opted not to include these – it’s not part of the core build, it’s user contributed, etc. – things you want to avoid if doing a massively reproducible on demand cloud environment. If I can work out how to include them, I’ll provide a followup post.
Comments
One Response to “No Piggybank for Pig on Hadoop on Azure”Trackbacks
Check out what others are saying about this post...[...] Slodge (@slodge) showed how to enable Piggybank for Hadoop on Azure in a 10/15/2012 post: I needed some time conversion scripts for my Pig on HadoopOnAzure… I looked around and could only find this http://www.bimonkey.com/2012/08/no-piggybank-for-pig-on-hadoop-on-azure/ [...]