And crawler optimization

Dec 14, 2007 11:05 GMT  ·  By

At the end of September 2007, Microsoft introduced the first major evolution to Live Search since the search engine debuted under the Windows Live brand umbrella in 2005. Moving to version 2.0 meant for the Redmond company delivering enhancements to a variety of aspects, making up the Live Search puzzle. However, increased focus was placed on boosting the relevance of the results returned to user queries. Nathan Buggia, Lead Program Manager of Microsoft Live Search Webmaster Center, revealed that the 2.0 update has made Live Search either the top search engine or an equivalent of similar products, including Google and Yahoo. Advanced cloaking detection is in this regard an integer part of Live Search 2.0.

"One of the biggest challenges with relevancy is how to distinguish legitimate information from various forms of search spam. This is one area that we've made especially good progress in over the last 8 months through a suite of tools that helps us detect, evaluate and manage spam. One of these tools is an extension to MSNBot, giving us an additional way to detect cloaking. (It should be noted that not all cloaking is spam related and we do our best to take this into account, however, we still don't recommend cloaking in any situation)", Buggia explained.

Cloaking is a search engine optimization technique that abuses the indexing model, in order to end up serving spam, pornography or other forms of dubious content. When moving to Live Search 2.0, Microsoft has also updated the MSNBot in order to tackle cloaking techniques. Buggia revealed that the implemented changes generated some inherent caveats. For the past months, the Redmond company has been hard at work, hammering away at the tool in order to resolve any issues related to the reporting metrics of some websites.

"AdSense/Overture reporting - Initially there was a bug in our crawler that caused it to download all content on your page, including ad blocks. Distort site statistics with unfilterable bot traffic - Webmasters have also reported a high level of traffic coming from this bot, in some cases high enough to impact their logs in a statistically significant way. Pollute HTTP logs with inappropriate terms - Another unfortunate issue is that we were using a common list of keywords for our testing that was not site specific. We have tuned this list and you should no longer see any keywords used that are not related to the content of your site", Buggia added.