Google On Robots.txt: When To Use Noindex vs. Disallow via @sejournal, @MattGSouthern

8 hours ago 1

Google clarifies robots.txt best practices and explains when to use robots.txt directives and noindex tags for SEO.

Don't combine robots.txt disallow with noindex tags.
Use noindex when you want a page crawled but not in search results.
Use robots.txt disallow for pages that should never be crawled.

In a recent YouTube video, Google’s Martin Splitt explained the differences between the “noindex” tag in robots meta tags and the “disallow” command in robots.txt files.

Splitt, a Developer Advocate at Google, pointed out that both methods help manage how search engine crawlers work with a website.

However, they have different purposes and shouldn’t be used in place of each other.

When To Use Noindex

The “noindex” directive tells search engines not to include a specific page in their search results. You can add this instruction in the HTML head section using the robots meta tag or the X-Robots HTTP header.

Use “noindex” when you want to keep a page from showing up in search results but still allow search engines to read the page’s content. This is helpful for pages that users can see but that you don’t want search engines to display, like thank-you pages or internal search result pages.

When To Use Disallow

The “disallow” directive in a website’s robots.txt file stops search engine crawlers from accessing specific URLs or patterns. When a page is disallowed, search engines will not crawl or index its content.

Splitt advises using “disallow” when you want to block search engines completely from retrieving or processing a page. This is suitable for sensitive information, like private user data, or for pages that aren’t relevant to search engines.

Common Mistakes to Avoid

One common mistake website owners make is using “noindex” and “disallow” for the same page. Splitt advises against this because it can cause problems.

If a page is disallowed in the robots.txt file, search engines cannot see the “noindex” command in the page’s meta tag or X-Robots header. As a result, the page might still get indexed, but with limited information.

To stop a page from appearing in search results, Splitt recommends using the “noindex” command without disallowing the page in the robots.txt file.

Google provides a robots.txt report in Google Search Console to test and monitor how robots.txt files affect search engine indexing.