The numbers from last September are huge, methinks

Jan 10, 2008 10:32 GMT  ·  By

No wonder Google is the Terminator of the Search Engine industry. 20,000 terabytes a day is an enormous amount of data to compute every day, in order to index the web, process search results and serve up ads (and that's not all it does). It must be the big competitive advantage that Google has over Yahoo, Microsoft and everybody else on the market.

The large scale computing I'm talking about is called MapReduce and the hardware cost for each use is about 1 million dollars and, as you can see in the chart that some of the engineers released, the number of such jobs was multiplied by 10, between 2004 and 2006, and then an additional ten times in the period up to September 2007. In case you did not properly understand the numbers that I've thrown before you, I'll break them down a little, because you can't just not know this.

It takes 1 million dollars every time the job runs on a particular hardware. Excuse my chin reaching the knees when hearing of such numbers, it must be because the only time I saw one million dollars was in movies, but I'm pretty sure that they were all fake.

Ok, now on to the 20,000 TB of data. That could be explained better as 40,000 disk drives worth of data, assuming that they all have a 500 GB capacity. Whoever owns a drive this size knows that no matter how much you gather on it, it just looks like an empty pit that you're trying to fill with paper while it's raining. Or, from another perspective, as someone wrote as a reply to Erick Schonfeld's (of TechCrunch.com) article on the topic, it's something like 1,500 disk drives working perfectly in parallel over 24 hours to turn up that much of data, assuming 1.5Gbits/sec SATA drives all working [IMPOSSIBLY] perfectly to their theoretical design limits, i.e. some 150MBytes/sec (practically, probably 30%-perfect efficiency, or even less, thus, at least talking about 5,000 drives working in parallel for 3600?24 seconds - just the data i/o part.)

Now, did that change the way you looked at Google?