Archive for January 20th, 2008

Mapreduce has taken some criticism from Stonebraker and DeWitt. I found this particular quote interesting:

we are amazed at the hype that the MapReduce proponents have spread about how it represents a paradigm shift in the development of scalable, data-intensive applications

Personally I’ve not heard any such claim from the MapReduce community (and who is that anyway?).

I’ve always seen MapReduce as nothing more than a useful tool for processing massive amounts of file-based data in ad-hoc fashions. This ad-hoc requirement is significant to me because DBMS’en have a tendency toward organizing data into static structures:

The DBMS community learned the importance of schemas, whereby the fields and their data types are recorded in storage. More importantly, the run-time system of the DBMS can ensure that input records obey this schema. This is the best way to keep an application from adding “garbage” to a data set. MapReduce has no such functionality, and there are no controls to keep garbage out of its data sets. A corrupted MapReduce dataset can actually silently break all the MapReduce applications that use that dataset.

These static structures can make it difficult to change things in support of new modes of usage. For example, one doesn’t lightly change index terms within a DBMS especially for large amounts of data. Perhaps the breadth of MapReduce queries run at Google would make regular index changes essential hence they chose to avoid a DBMS approach.

I wonder just how much the authors assumed about the way things are done at Google and what kind of “queries” they run. Consider that GFS which apparently underpins MapReduce is focused on append-only, not the sort of thing one sees in the DBMS world which accounts for updates and inserts amongst other things.

Many eyes in the industry have indulged in the bad habit of seeing the DBMS as the data storage equivalent of a Swiss Army Knife. It is the universal hammer for all data storage and analysis nails regardless of the actual requirements. Could Stonebraker and DeWitt have gotten caught up in this “classic mistake”? Surely not given what they’ve said in the past?

Comments 1 Comment »

Disclaimer: This is a personal blog. The views and opinions expressed here represent my own and not those of the people, institutions or organisations that I may or may not be related with unless stated explicitly.