Google's index weighs 100 million gigabytes, for the 30 trillion pages of the web

Mar 2, 2013 13:21 GMT  ·  By

Google is not known for offering much info about its search engine. In fact, it's renowned for its secrecy. The new "How Search Works" websites aims to change that, in more ways than one.

The site is similar to the one describing how email works, which Google revealed last year. In fact, that site inspired the search team to put together something similar for the search engine.

Plenty of info and an overview of how a search query works, under the hood, is presented in infographic style.

"Ask a question, get an answer. But what happens in between? Last year we released an animated site that illustrates an email's journey to friends and family around the world. Today we're releasing a similar website called How Search Works," Google wrote.

"Here you can follow the entire life of a search query, from the web, to crawling and indexing, to algorithmic ranking and serving, to fighting webspam," Google explained.

The site could prove useful or at least interesting to most people, but Google is also publishing its Search Quality Rating Guidelines for the first time.

This document should be very useful to webmasters, though some leaked versions have circulated before.

Google indexes the 30 trillion pages of the web, that's presumably the public web, what Google can access. The entire web is bigger than that.

Still, that's a staggering amount of stuff to keep track of, sort and make available almost instantly. The entire index of the web, as stored by Google is 100 million gigabytes, that's 100 petabyes of storage.

Google's algorithms then rank all of these pages depending on the query and remove the spam or the poor quality ones. The new site actually goes into quite a bit of detail over each step, the crawling and indexing, algorithms, and the quality control.

Photo Gallery (2 Images)

Google's How Search Works website
Google's How Search Works website
Open gallery