Feb 2, 2011 15:24 GMT  ·  By
There will be only one Hadoop, not counting Cloudera enterprise-oriented flavor
   There will be only one Hadoop, not counting Cloudera enterprise-oriented flavor

Those looking for a platform to mange huge data sets on a large scale will have an easier time choosing, Yahoo has announced that it will be dropping development on its own Hadoop version and instead will start working together more closely with the Apache community and fully supporting Apache Hadoop.

The community-driven edition will eventually be adopted internally by Yahoo which will be merging all improvements and changes from its own distribution.

"I'm pleased to announce that after some reflection, Yahoo! has decided to discontinue the 'The Yahoo Distribution of Hadoop' and focus on Apache Hadoop," Eric Baldeschwieler, VP of Hadoop development at Yahoo, announced.

"We plan to remove all references to a Yahoo distribution from our website, close our github repo and focus on working more closely with the Apache community," he explained.

Both distributions share common roots, obviously, but, in time, Yahoo started doing more work independently, despite being one of the leading forces in developing Apache Hadoop, early on.

In 2009, Yahoo decided to open source the version it had been using internally, in an effort to help the community move forward faster. However, this actually led to a weaker ecosystem since there were now two major open source Hadoop distributions.

Development continued in parallel leading to the situation today when Yahoo decided that the only way of ensuring a balance was to start working on an unified distribution.

As such, Yahoo is now looking to merge its code with Apache Hadoop. The current stable version Yahoo uses on 40,000 notes, Hadoop-0.20-sustaining, will be merged into the Apache Hadoop 0.20 Security branch.

Likewise, Yahoo's Hadoop-future, where new features are implemented and tested before introducing them on a large scale, will be merged into the active development branch. From there, Yahoo plans to focus solely on the Apache-backed distribution and contribute to it directly.

"In summary, our decision to discontinue the 'Yahoo! Distribution of Hadoop' is a commitment to working more effectively with the Apache Hadoop community. Our goal is to make Apache Hadoop THE open source platform for big data," Baldeschwieler concluded.