Yahoo's Hadoop-based mail infrastructure moves 70,000 emails per second

Oct 14, 2011 14:01 GMT  ·  By

With all of the restructurings, the reorganizations, the plans to revamp the company and focus on content, it's easy to forget that Yahoo is, at heart, a technology company. Which is why the engineers still working at the company need to remind us with things like the Yahoo Mail Visualisation Project.

Just like the name implies, the tool provides a view of email traffic across the Yahoo Network.

The interesting part is that it's all in quasi real time and that it also displays the amount of spam being blocked by Yahoo, on any given second.

The visualization tool is quite slick and it also provides an interesting tool to see where the most Yahoo users are.

Unsurprisingly, most of the activity centers around the US and Europe, with some parts of Russia and Eastern Asia pitching in.

Africa, Australia and South America seem pretty barren when it comes to Yahoo activity.

The tool is also interesting for tracking the amount of email going through Yahoo at any given point.

For example, the tool is showing that about 70,000 emails are delivered every second, at this point, but that number jumps to more than 300,000 when you add all of the spam Yahoo blocks.

"We think the technology we’re using to keep your mailboxes safe is quite awesome and we wanted to show you how we do it," Markus Weimer, from Yahoo Labs, and Andreas Neumann, Grid Architect at Yahoo, wrote.

A brand new Yahoo mail debuted this year and, among the touted features, the new spam busting technologies were a highlight.

"So today we’re going live with the Yahoo! Mail Visualization Project – a view of what no one has seen before using live data … how we use cloud computing and Apache Hadoop technology to filter spam and re-route email for the 300 million mail users we have across the globe," they explained.