I have the following subdomains:
api.example.com is supposed to be invisible to Google. Its
robots.txt had a
Disallow / directive. But I don't want any server processing at all to go on at
The main site
www.example.com/robots.txt points to
api.example.com/sitemap.xml (sitemap requires processing).
The problem is that Google's Search Console is complaining that the
sitemap.xml is blocked by the
robots.txt file... I presume it's reading the robots on
api.domain.com. So I tried pointing www's robots to
www.domain.com/sitemap.xml and putting a redirect there. No luck.
So it seems I'm forced to put an
Allow /sitemap.xml on api's
robots.txt. Will Google get confused by this? Will it try to index the sitemap's urls (the sitemap is 100% absolute urls pointing to
www.example.com) and somehow dilute authority between the two domains?
api.example.com is not registered on Search Console as a property.
First of all, you should register api.domain.com in Google Search Console (GSC). This will allow you to see how many pages from that sub-domain is getting indexed by Google.
You can also use GSC to block api.domain.com completely from Google if you want. (depending on the situation this is not recommended)
If you don't want api.domain.com to get indexed, it should not have a sitemap to start with. Also, you should not include any of api.domain.com in the www. version's sitemap.
In other words, Google will get confused because you are telling Google to don't access the api. site, yet providing a Sitemap which is supposed to help Google to index the site.