Via Query-Dependent Ranking

May 26, 2008 11:36 GMT  ·  By

When Microsoft took its sight of Yahoo, Chairman Bill Gates explained that the software giant would grow its search engine and online advertising businesses organically, focusing on innovation. The past year, the Redmond company poured in excess of $7 billion into research and development, and a potion of the funds went to improving Live Search. In this context, Roy Levin, Director, Silicon Valley Lab, explained that Query-Dependent Ranking, a research project, managed to produce double the relevance in web queries. In terms of Internet search, relevance is a critical aspect, one that Microsoft is continually striving to improve for Live Search to compete on par with rivals Google and Yahoo.

"You go to a search engine, you type in a bunch of words, you get back a bunch of query results, and you'd very much like to have those ordered by relevance so that the ones at the top are the ones of most importance or interest to you," Levin said. "The way that this was solved in search engines involves a variety of ranking algorithms, or ranking results. The classical way to do this, the page rank algorithm, uses the static structure of the Web graph. That is to say, it computes the relative interest of pages without paying attention at all to what your query was. What happens is that the query selects some subset of the pages, and then they're ranked based on this static notion."

Another approach to focusing exclusively on the static structure of the Web graph is to also take into consideration the actual information included in users' queries. Such a strategy would bridge the query results compiled in a ranking system, with the actual pages returned by the search engines. The implementation of a system which understands the relationship between query information and items in the Web graph has so far failed to be successful. But Microsoft Research has apparently found a receipt which works.

Levin explained that the Query-Dependent Ranking project implements "some very specialized infrastructure to make it possible to analyze a variety of those algorithms at scale, which means having a representation of the Web graph, very large number of pages, that is extremely efficient to probe and to ask questions about, like where are all the pages that are two away from this one, either in the forward or the backward direction of the hyperlinks. Once you have a system like that, you can then go and analyze these graph algorithms, and discover what works best."

Microsoft's implementation demonstrated that specific algorithms can even double the relevance performance. According to Levin, incorporating query-specific data into the results delivers a superior user experience in terms of relevance.

"It may not seem like a lot, but people kill for a few percent in this area. So something that actually doubles the relevance is extremely attractive. And so what that tells us is that this query dependent ranking thing really does work, and now you can afford to go and invest in building infrastructure that will actually support that on a production scale," Levin added.