Sitemap in different domain and robots.txt crawling

by Ubermann   Last Updated February 28, 2019 23:04 PM

I have the following subdomains:

  • www.example.com
  • api.example.com

The subdomain api.example.com is supposed to be invisible to Google. Its robots.txt had a Disallow / directive. But I don't want any server processing at all to go on at www.example.com.

The main sitewww.example.com/robots.txt points to api.example.com/sitemap.xml (sitemap requires processing).

The problem is that Google's Search Console is complaining that the sitemap.xml is blocked by the robots.txt file... I presume it's reading the robots on api.domain.com. So I tried pointing www's robots to www.domain.com/sitemap.xml and putting a redirect there. No luck.

So it seems I'm forced to put an Allow /sitemap.xml on api's robots.txt. Will Google get confused by this? Will it try to index the sitemap's urls (the sitemap is 100% absolute urls pointing to www.example.com) and somehow dilute authority between the two domains? api.example.com is not registered on Search Console as a property.



Answers 1


First of all, you should register api.domain.com in Google Search Console (GSC). This will allow you to see how many pages from that sub-domain is getting indexed by Google.

You can also use GSC to block api.domain.com completely from Google if you want. (depending on the situation this is not recommended)

If you don't want api.domain.com to get indexed, it should not have a sitemap to start with. Also, you should not include any of api.domain.com in the www. version's sitemap.

In other words, Google will get confused because you are telling Google to don't access the api. site, yet providing a Sitemap which is supposed to help Google to index the site.

Tony Hsieh
Tony Hsieh
October 02, 2015 16:17 PM

Related Questions


Updated January 26, 2018 17:04 PM

Updated February 22, 2018 18:04 PM

Updated July 14, 2018 07:04 AM

Updated April 17, 2017 13:04 PM