There are situations when you do not want that certain URLs of your website to be indexed by search engines
robots. Fortunately, solutions exist if you need to remove outdated content
like entire web pages, individual images and more.
The general methods for blocking search engines robots use robots.txt files, specific meta tags definitions or a .htaccess file. If you want to remove web content from Google or to prevent the indexing of a website or parts of it, the most recommended options would be the creation of a custom robots.txt file or the implementation of robots meta tags into the HTML code of your pages. In order to block the Google bots for further indexing actions, the file robots.txt, which must be placed in the root of your domain, will have to contain the next two lines:
CODE
User-agent: Googlebot
Disallow: /
The directive specified in the robots.txt file disallow the entire website indexing by the Google bot. In a similar manner you can specify only a directory of your website. Or, as an alternative, you can insert appropriate robots META tags into the HEAD of your HTML pages:
The meta tag considered in the example tells to Google robots not to index the current web page and also not to follow the links existing on that page. After you define a robots.txt file or the meta tag that blocks the indexing actions of Google bots, go to the Google Webmaster Tools website and select the desired content you want to remove from Google index. In a similar way you can also remove a cached copy of a Google search result.
It is good to know that certain robots does not respect the directives defined by robots.txt or META tags. In order to create a better protection for your website content privacy it is recommended to use .htaccess files to block spiders.