The company granted a new patent to improve its technology

Jan 6, 2007 12:36 GMT  ·  By

Duplicate content represents a major problem for all search engines, but Google is one of the companies that are continuously fighting to block similar websites. The search giant already started to remove websites that are reported to contain duplicate content but, this is a hard job because it's difficult to identify all the sites that are currently registered for Google. It seems like the things will change soon because the company granted a new patent that will allow Google to identify duplicate content quicker and easier.

Google granted the US patent number 7,158,961 on January 2, 2007, being filed in December 31, 2001. It is entitled "Methods and apparatus for estimating similarity" and it says that "A similarity engine generates compact representations of objects called sketches. Sketches of different objects can be compared to determine the similarity between the two objects.

The sketch for an object may be generated by creating a vector corresponding to the object, where each coordinate of the vector is associated with a corresponding weight. The weight associated with each coordinate in the vector is multiplied by a predetermined hashing vector to generate a product vector, and the product vectors are summed. The similarity engine may then generate a compact representation of the object based on the summed product vector."

The duplicate content problem gave Google several complaints from its users that were removed from Google's index after their site was identified as containing duplicate content. If you get your site removed from Google's index, you should contact the company and send a reinclusion request that will make Googlebot recrawl your website. If you want to see all the details about the patent granted by the search giant, you should follow the link published by the US patents organization.