MapReduce in C# for Hadoop on Azure

There are a bewildering array of language options available to write Mappers and Reducers (aka MapReduce) – Java and Python feature heavily, and for the non programmer the entire exercise is borderline incomprehensible.

However, a kind soul by the name of Sreedhar Pelluru has posted a simple walkthrough for building a Mapper and Reducer using C# & VS2010 for us Microsoft oriented souls, with an intended Hadoop on Azure target. The walkthrough is here: Walkthrough: Creating and Using C# Mapper and Reducer (Hadoop Streaming)

There are a few holes in the script so here’s the things to look out for:

  • In the section “Create and run a Map/Reduce job on HadoopOnAzure portal”, the first item suggests you run a Javascript command to get the IP address, but doesn’t provide it until a few lines later – the command is: “#cat file:///apps/dist/conf/core-site.xml “. You can also find out the IP by remoting into the cluster and running IPConfig at the command line.
  • Step 7 in the same section asks you to open hadoop-streaming.jar, and it took me a while to realise this mean on the HadoopOnAzure portal, not on your local machine (so I spent quite a bit of time in misadventures trying to manipulate the file on my local machine)
  • Error messages for job failure aren’t terribly helpful, and there’s no validation on job parameter input, so really really make sure that your command really does look exactly like the one in step 11. Miss a double quote or mistype a path and you will get no hint as to that being the source of the error.

Eventually I beat the above and achieved victory – a successful job run on HadoopOnAzure with the expected results. Next challenge – build my own data, mapper & reducer and repeat. Then get it into Hive….


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>