I want to allow a certain html file and my site's index file to be indexed by search engines. Everything else should be disallowed. My home directory does not actually contain an index file, I am using .htaccess to redirect to
/cgi-bin/index.cgi. I am currently using this:
User-agent: * Allow: /cgi-bin/index.cgi Allow: /contact.html Disallow: /
However, google webmaster tools is saying:
Googlebot is blocked from
Is there a way of allowing indexing of the root while blocking all other files i.e.,
Maybe try it the other way round, put the
disallow before the
If the Wikipedia article on robots.txt is correct, it should work:
While by standard implementation the first matching robots.txt pattern always wins, Google's implementation differs in that Allow patterns with equal or more characters in the directive path win over a matching Disallow pattern. Bing uses the Allow or Disallow directive which is the most specific.
As suggested by Pekka you may want to try to place the Allow directives after the Disallow directives.
But given the differences in interpretations between Google, Bing and others, you may want to use a robots meta tag instead. This will be safer and more granular.
In your disallowed pages:
<meta name="robots" content="noindex" />
In your allowed pages:
<meta name="robots" content="index" />
(to be placed in your