No Piggybank for Pig on Hadoop on Azure
I found this out as I was trying to run some things through Pig, trying to manage some Excel CSV files that had fields with line feeds in them. I discovered there was a Pig load/store function CSVExcelStorage that would handle them, but when I tried to use it… ah. Not there. Turns out it was a piggybank function, which are a set of user contributed functions that you have to include in your pig build. The source code is freely available (being open source and all) but I haven’t worked out how in an HOA environment you can build them and use them.
I can understand why Microsoft have opted not to include these – it’s not part of the core build, it’s user contributed, etc. – things you want to avoid if doing a massively reproducible on demand cloud environment. If I can work out how to include them, I’ll provide a followup post.