MapReduce: A major step backwards

apathy · on Jan 18, 2008

What an incredible piece of shit.

SCREWDRIVERS POORLY SUITED FOR DRIVING NAILS! FILM AT 11!

Right tool for the job. The right tool for accessing half a petabyte of data on unreliable Xeon-based servers in bakery racks is NOT a distributed database. MapReduce... is.

Oh, and one last thing -- he should be comparing BigTable with a standard RDBMS if he wanted to have so much as a single shred of credibility. Google uses traditional RDBMSes internally -- but not for the heavy lifting of indexing and caching. Because an RDBMS is a shitty tool for that job. That's one reason why the founders are rich as all get out -- they didn't worry about which tourniquet to use while the patient bled out. Stupid fucking religious wars.

yrashk · on Jan 18, 2008

I second that "piece of shit" -- RDBMS are bad tools for some jobs, and there is still a lot of space for specialized kinds of database; especially I believe in a need for a lightweight (and ^^ slow ^^ :) decentralized document-oriented database. And yes, again, this kind of database will not be suitable for every need, but for some domains only.

bayareaguy · on Jan 18, 2008

This is classic Stonebraker. It's a favorite strategy of his to say something so offensive to the audience that they will be itching for a rebuttal. Then on closer inspection you discover that it's much ado about nothing.

Here is one example: he says Teradata used MapReduce techniques 20 years ago and he's right (Teradata's ability to handle arbitrarily large data sets with hash partitioning helped WalMart get to where it is today). A Teradata system makes additional assumptions regarding data placement that MapReduce doesn't. In a Teradata system the data is actually stored at the processing nodes. If you take the idea of MapReduce and "extend" it with an optimizer that knows where the data is and how it is organized, you get pretty close to the Teradata architecture.

Viola: an RDBMS based on MapReduce.

Now look closely at what he's saying: in terms of distributed database research, MapReduce is a step backwards.

Well, he may be right on there. MapReduce is not likely to help you win any database research grants - it was old news to that community 20 years go.

weel · on Jan 18, 2008

To be entirely honest with y'all, I haven't carefully read all of this article. But I already have my opinion ready!

The fact that MapReduce is a step back, in a sense, is true. But it's a step back from a dead-end alleyway, a step away from a particular very complex and bloated programming model--to wit the SQL-based RDMS--that is good as far as it goes, but requires so much effort to implement that it is unlikely to be a great model to use as the basis for innovative implementations.

If you're going to do crazy shit of the sort that google does, massively distributed computations on very high volumes of data, then in order to do it well you have to keep it simple. Such is life. There are plenty of RDMSes that can do the full whammy of RDMS features, but try to take, say, one that does not allow for replication and then implement replication. See you next decade.

Sometimes taking a step back can help put things into perspective, is all I'm saying.

anonym · on Jan 18, 2008

[Note: Although the system attributes this post to a single author, it was written by David J. DeWitt and Michael Stonebraker]

Oh, the irony.