Googlebot is responsible for over 60% of all web crawls

Jul 24, 2014 13:01 GMT  ·  By

When it comes to webcrawlers, Google takes the lead with a staggering 60.5 percent of the total number of page crawls.

Incapsula reveals in a brand new report that the bot traffic landscape is more complex than you’d originally think.

For those of you who don’t know, web crawlers are Internet bots that browse the Internet to index pages. They’re also known as web spiders, automatic indexers and web scutters. Among all those available out there, the Googlebot is the most intriguing one.

That’s because it does more than crawl the Internet; it “facilitates knowledge exchange between billions of humans, influencing our perceptions, preferences and imaginations in more ways than we can ever comprehend.”

Googlebot’s behavior, in short, is hard to understand in its complexity, but that doesn’t mean people don’t try.

Incapsula, for instance, observed over 400 million search engine visits to 10,000 sites over a 30-day period. During this time, 2.19 billion page crawls took place, some from Googlebot, some from other similar tools and some from impostors.

“The first interesting fact about Googlebot is just how active it really is. It should come as no surprise that Googlebot is more thorough than any of its peers. However, it was interesting to see Googlebot actually crawling more pages than all other search engines combined,” Incapsula points out.

According to a chart, Googlebot accounts for 60.5 percent of all webcrawls, while the second spot is taken by the MSN/Bing Bot, with 24.5 percent. Baidu Spider is responsible for 4.4 percent of crawls, while Majestic12 Bot handled 3 percent of them. Yandex Bot is the fifth most active individual crawler, with 2.3 percent. The remaining percentage splits to all other spiders out there.

Are popular sites more visited by spiders?

One of the goals of the study was to investigate whether there was a connection between a site’s popularity and Googlebot’s crawl rates. “Simply put, the hypothesis here was “popular sites get more crawls,” Incapsula explains.

To this end, the company created a sample group of 10,000 websites and then on five sub-segments categorized by the number of daily human visitors.

The results weren’t what some were hoping for. In fact, there’s no significant correlation between the volume of human visits and the number of Googlebot visits. Basically, Google’s tool doesn’t play favorites.

“This data also hints at a disconnect between the rate of Googlebot crawls and a website’s SEO performance. Typically, organic search traffic accounts for 10%-30% (or more) of a website’s total visits. Thus, if a high crawled rate would indeed translate into a higher share of organic visits, we would expect to see that reflect (at least to some extent) in our data. As it stands out, our numbers show no such correlation, giving web operators and SEO pros one less thing to worry about,” Incapsula writes.

How often does Googlebot visit?

Another goal the company had was to analyze Googlebot’s crawl patterns. After processing 210 million Googlebot sessions, the company found the bot visits a website about 187 times per day and crawls 4 pages per visit.

Of course, these are just averages. In 12.5 percent of cases, for instance, Googlebot visited sites over 500 times a day, while in the most extreme cases they even saw 210,000 pages crawled during a single 72-hour visit.

It seems that sites that are content-heavy and get frequent updates are the most thoroughly crawled sites. This includes forums, news sites and online shops.

Most Googlebot visits, some 98.1 percent come from the United States, but there have also been cases where the detected visits come from the UK, France, Belgium, Denmark or China.

You can also read more about the bad version of web crawlers, namely the fake Googlebots out there.

Photo Gallery (3 Images)

Incapsula looked into web crawlers, found Google's spider
Googlebot is really activeIncapsula shows why Googlebot doesn't care if the site is big or small
Open gallery