Allow Google to crawl the domain root (home page) while disallowing all other pages in robots.txt

by Matthew Hui   Last Updated August 31, 2019 04:04 AM

I want to allow a certain html file and my site's index file to be indexed by search engines. Everything else should be disallowed. My home directory does not actually contain an index file, I am using .htaccess to redirect to /cgi-bin/index.cgi. I am currently using this:

User-agent: * 
Allow: /cgi-bin/index.cgi
Allow: /contact.html  
Disallow: / 

However, google webmaster tools is saying:

Googlebot is blocked from http://example.com/

Is there a way of allowing indexing of the root while blocking all other files i.e., example.com/*

Tags : robots.txt


Answers 2


Maybe try it the other way round, put the disallow before the allow.

If the Wikipedia article on robots.txt is correct, it should work:

While by standard implementation the first matching robots.txt pattern always wins, Google's implementation differs in that Allow patterns with equal or more characters in the directive path win over a matching Disallow pattern.[8] Bing uses the Allow or Disallow directive which is the most specific.[9]

Pekka 웃
Pekka 웃
August 27, 2011 21:10 PM

As suggested by Pekka you may want to try to place the Allow directives after the Disallow directives.

But given the differences in interpretations between Google, Bing and others, you may want to use a robots meta tag instead. This will be safer and more granular.

In your disallowed pages:

<meta name="robots" content="noindex" />

In your allowed pages:

<meta name="robots" content="index" />

(to be placed in your <head> tag)

See http://googlewebmastercentral.blogspot.com/2007/03/using-robots-meta-tag.html

Arnaud Le Blanc
Arnaud Le Blanc
August 27, 2011 21:19 PM

Related Questions


Updated March 14, 2017 14:04 PM

Updated May 26, 2019 03:04 AM

Updated December 21, 2016 08:01 AM

Updated August 20, 2019 02:04 AM

Updated April 13, 2015 20:01 PM