Allow Google to crawl the domain root (home page) while disallowing all other pages in robots.txt

by Matthew Hui   Last Updated August 31, 2019 04:04 AM

I want to allow a certain html file and my site's index file to be indexed by search engines. Everything else should be disallowed. My home directory does not actually contain an index file, I am using .htaccess to redirect to /cgi-bin/index.cgi. I am currently using this:

User-agent: * 
Allow: /cgi-bin/index.cgi
Allow: /contact.html  
Disallow: / 

However, google webmaster tools is saying:

Googlebot is blocked from

Is there a way of allowing indexing of the root while blocking all other files i.e.,*

Tags : robots.txt

Answers 2

Maybe try it the other way round, put the disallow before the allow.

If the Wikipedia article on robots.txt is correct, it should work:

While by standard implementation the first matching robots.txt pattern always wins, Google's implementation differs in that Allow patterns with equal or more characters in the directive path win over a matching Disallow pattern.[8] Bing uses the Allow or Disallow directive which is the most specific.[9]

Pekka 웃
Pekka 웃
August 27, 2011 21:10 PM

As suggested by Pekka you may want to try to place the Allow directives after the Disallow directives.

But given the differences in interpretations between Google, Bing and others, you may want to use a robots meta tag instead. This will be safer and more granular.

In your disallowed pages:

<meta name="robots" content="noindex" />

In your allowed pages:

<meta name="robots" content="index" />

(to be placed in your <head> tag)


Arnaud Le Blanc
Arnaud Le Blanc
August 27, 2011 21:19 PM

Related Questions

Updated March 14, 2017 14:04 PM

Updated May 26, 2019 03:04 AM

Updated December 21, 2016 08:01 AM

Updated August 20, 2019 02:04 AM

Updated April 13, 2015 20:01 PM