Robots.txt vs Noindex

How To Properly Remove a Page from Google’s Search Results

This is a very common mistake that I wanted to point out. Sometimes people think that if they block a page from being crawled in the robots.txt that Google won’t index it. That is not true.

Here are a couple of comments from Google that explain it very well.

Be careful about disallowing search engines from crawling your pages. Using the robots.txt protocol on your site can stop Google from crawling your pages, but it may not always prevent them from being indexed. For example, Google may index your page if we discover it by following a link from someone else’s site. To display it in search results, Google will need to display a title of some kind and because we won’t have access to any of your page content, we will rely on off-page content such as anchor text from other sites. (To truly block a URL from being indexed, you can use the “noindex” directive.)


Important! For the noindex directive to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex directive, and the page can still appear in search results, for example if other pages link to it.