Making handling huge amounts of data easier and safer

Jun 29, 2010 14:27 GMT  ·  By

Yahoo, which has already been Hadoop’s largest contributor, is now releasing a couple of related products as open-source projects at its annual Hadoop Summit. The two tools focus on security and workflow with Hadoop and are based in products used internally already by the giant web company. Yahoo operates the largest implementation of Hadoop on the web. The two new tools, Hadoop with Security and Oozie will be handed over to the Apache foundation.

“Our latest contributions to Hadoop include Hadoop with Security and Oozie, Yahoo!’s workflow engine for Hadoop. The addition of authentication and a robust workflow engine will further fuel the wider adoption of Hadoop. These enhancements allow Hadoop adopters to better manage their big data. We hope it opens the enterprise door even wider to cloud computing, enabling organizations of all types to realize the power of Hadoop,” Eric Baldeschwieler, Yahoo VP in charge of Hadoop, wrote.

Hadoop with Security’s goal should be pretty self-explanatory. It integrates Hadoop with the Kerberos authentication protocol and provides secure access to data handled with Hadoop. The tool should come in especially handy for large organizations which have multiple teams and projects sharing the same platform.

Yahoo says it probably runs into the problem of having to manage different access privileges for different groups more often than any other company using Hadoop. Yahoo has about 35,000 servers currently running Hadoop. As companies extend their use of the platform they will face the issue as well, Yahoo says.

The second product Yahoo is open-sourcing today is Oozie. It’s a workflow tool designed to help administrators manage and automate jobs with Hadoop. Again, this tool is especially useful for larger companies and organizations that handle very large amounts of data and tasks.

The news should be great for the Hadoop community, but there are some concerns about the ecosystem building around the technology. With these kinds of tools being offered for free it becomes harder to sell services and apps designed to help companies manage Hadoop. However, since the new tools are open-source, there’s nothing stopping third-party developers from incorporating them or their functionality. Yahoo has open-sourced other Hadoop tools and plans to offer even more.