SQL Pass Day #3

The third and final day at SQL Pass was presaged by me at the bloggers table (though only able to manically tweet) watching Dr DeWitt’s keynote – and I can see why his keynotes are so highly regarded. His subject was Big Data – and given the potential for this to be a dull and impenetrable subject area – he gave a great and illuminating talk on the topic.

Topics that he covered included:

  • ACID vs BASE (i.e the battle between consistency of data vs. availability of data)
  • NoSQL is a means of querying raw data with no cleansing / structure / ETL
  • His expectation is that Structured (SQL) and Unstructured (Hadoop) data will coexist in organisations
  • Hadoop consists of Storage (HDFS) and Process (MapReduce)
  • MapReduce is too complex to work with so languages such as Hive and Pig sit on top of it
  • Sqoop is the tool to make Unstructured and Structured data talk – but performance is not good

I can’t really do his talk justice but now I understand Hadoop a whole lot better – essentially it’s just a read only store of unstructured data, a very different beast to a relational database and addressing totally different needs.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>