Duplicate detection included in Google news

Sep 3, 2007 08:34 GMT  ·  By

The news technology powered by Google receives a new and long-awaited feature that will surely bring more readers: duplicate detection. The new function of Google News is meant to reduce the amount of duplicate articles, giving the possibility to find only the original authors of the stories such as the top Internet news agencies. Obviously this is extremely useful when we think that numerous websites publish exactly the same articles as the news agencies but all of them are indexed by Google News. In this context, a duplicate filter would be very useful since it will also provide access to both original and duplicate articles.

"Duplicate detection means we'll be able to display a better variety of sources with less duplication. Instead of 20 "different" articles (which actually used the exact same content), we'll show the definitive original copy and give credit to the original journalist. (We launched a similar feature in Sort-by-Date and got great feedback about it.) Of course, if you want to see all the duplicates on other publisher websites with additional analysis and context, they're only a click away," Josh Cohen, Business Product Manager, wrote today.

At this time, Google News receives information from approximately 4,500 sources and only a small percentage of them are offering original content. Google is really proud of this new function and admits that it will improve the news service a lot as the original authors will be more appreciated by the readers.

"By removing duplicate articles from our results, we'll be able to surface even more stories and viewpoints from journalists and publishers from around the world. This change will provide more room on Google News for publishers' most highly valued content: original content. Previously, some of this content could be harder to find on Google News, and as a result of this change, you'll have easier access to more of this content, and publishers will likely receive more traffic to their original content," the Google employee added.