The new indexing system is finally live for everyone

Jun 9, 2010 08:27 GMT  ·  By

Google has finally rolled out its updated indexing system Caffeine, which first started testing last summer. The new system promises fresher content in results with significantly less delay between when a page is published and when it ends up in Google searches. It’s also better at processing the enormous amounts of information search engines regularly need to. Most of the changes are of interest to Google internally and, to a degree, to webmasters, but, in the end, affect every user through the quality of the search results.

“Today, we're announcing the completion of a new web indexing system called Caffeine. Caffeine provides 50 percent fresher results for web searches than our last index, and it's the largest collection of web content we've offered. Whether it's a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before,” Carrie Grimes, software engineer at Google, wrote in a blog post.

The biggest change is in the way the Google indexing system works. Before, content on the web was divided in groups, or layers, each with their own ‘priority’ for indexing. Most of the websites would end up in the ‘main’ layer, which would get updated every couple of weeks or so.

This is because the old system worked in sequence, it would first crawl all the web pages, then process them and finally make them available for search. This meant that a page, even if it had already been analyzed, had to wait until the system got through all of the pages in the batch before ending up in the search index.

With Caffeine, the web is analyzed in small portions, so the index is constantly updated. This means that all web pages should be available for search faster than before and this is not just for ‘real-time’ content.

Caffeine has been gradually rolled out. It was deployed in one data center before the end of 2009 and now powers all Google searches. To give you an idea of the amount of content Caffeine processes, Google has also released a few stats and some comparisons.

“Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles,” the blog post read.