The new system brings some more advanced methods of detecting similar entries

Jul 1, 2009 06:51 GMT  ·  By
Digg introduced an improved Dupe Detection system but it will have a trial period of 30 days before becoming mandatory
   Digg introduced an improved Dupe Detection system but it will have a trial period of 30 days before becoming mandatory

Digg, the social news aggregator site, has launched an updated feature to prevent duplicate entries from being submitted. Called Dupe Detection, the system brings some new detection methods that should make it harder to find the same story twice. Previously it would only block submissions that used the same URL within 30 days.

“To better understand the nature of the problem, we analyzed the types of duplicate stories being submitted. Most common are the same stories from the same site, but with different URLs. Our R&D team came up with a solution that identifies these types of duplicates by using a document similarity algorithm. Look for a separate tech blog post on how this works, but it has proven to be a reliable way of identifying identical content from the same source,” Chris Howard, Digg director of product, wrote in a blog post.

Of course, eliminating the same exact story can be a little tricky but it's far from a challenge. However, when it comes to discerning if two entries are actually the same story but from different sites things get a little more complicated. Digg uses its new search capabilities to look for stories already submitted that have similar titles or descriptions. While this way Digg can prevent the same story from showing up twice it can't actually know which of the two has the better content and the one that came first will be left on the site.

Digg also changed the way a story is checked when it's being submitted and will search for dupes before you enter a description for the story, saving you time in case it turns up that the story was already submitted. The changes are already implemented but will enter a trial period for 30 days when only entries with the same URL will be blocked. However, Digg will be watching to see if a user chooses to submit a story that the dupe system has detected as being already available.