Recommended practices to avoid duplicate content

Jun 16, 2008 15:35 GMT  ·  By

The content of a website is considered duplicate if parts of an existing published content are repeated on a certain number of pages across many websites and domain names. Even if building a website that has content automatically generated sounds tempting, like in the case of RSS feeds publishing in HTML pages, the consequences of such a thing can often spell exactly what no webmaster would desire.

Practically, the multiplication of the same content is detected by search engine spiders that will index and display only original content and, at the same time, will exclude or ban the websites or web pages consisting of duplicate content.

There is also the situation when identical content is published without your intention on the same website, or when other webmasters have reproduced certain parts of your content by publishing your RSS feeds on HTML pages or, finally, when two versions of the same content appear on the same website page, like in the case of the coexistence of a printer-friendly version page and the HTML page.

If the same website content is published two or more times on the same page, then simple solutions exist to avoid the consequences enforced by the duplicate content detection by search engines spiders.

For example, one of the page versions (printer-friendly and regular versions) can be hidden from search engines spiders by blocking it, and thus only one of the versions is indexed. Consequently, you should add a noindex meta tag to the pages that must not be indexed by the search engine's robots. At the same time, the robots.txt file or the sitemap could also give the correct indexing direction for the search engine spider.

As far as the republishing of articles is concerned, the search engine spider will always look for the original source of the articles. You can always go for the by-now familiar suggestion of linking back to the source (the website containing the original content), should you consider reproducing an article on another website. In order to avoid the existence and detection of duplicate content, the chances of it occurring must be minimized by decreasing the number of possibilities for it to appear.