Aug 8, 2011 13:30 GMT  ·  By

Twitter, like many of the new tech superstars relies heavily on open-source technology and, in return, has open-sourced a number of its projects. Facebook is another prime example, though on a larger scale.

Twitter announced last week that it will be open-sourcing a massive, real-time data processing technology dubbed Storm, similar in some ways to Hadoop, and that it will be making the code available next month.

Storm is not a home-grown technology, it came courtesy of Twitter's BlackType acquisition. The startup had already promised that it would open-source Storm, Twitter just set a date.

There's no shortage of open-source tools for huge data sets, but Storm does have some characteristics that set it apart and make it more suitable for certain tasks than the existing alternatives.

The main differentiating factor is that Storm focuses on continuous streams of data which it processes, unlike tools like Hadoop which do a job at a time. A task in Storm ends when you tell it to.

"Stream processing: Storm can be used to process a stream of new data and update databases in realtime," Nathan Marz, BlackType's former lead engineer now working at Twitter explained. The blog post goes into further technical details if you're interested.

"Unlike the standard approach of doing stream processing with a network of queues and workers, Storm is fault-tolerant and scalable," he said.

"Continuous computation: Storm can do a continuous query and stream the results to clients in realtime. An example is streaming trending topics on Twitter into browsers," he added.

"Distributed RPC: Storm can be used to parallelize an intense query on the fly. The idea is that your Storm topology is a distributed function that waits for invocation messages. When it receives an invocation, it computes the query and sends back the results," he explained.

Twitter has said much about how it plans to integrate Storm into its infrastructure, but it is probably the site with the most use cases for a tool which can process a stream of data.