I am migrating a website from http to https entirely, all http urls will have 301 redirects to their https counterparts.
From google webmasters answer 6033049 (sorry, not enough reputation to post links):
We reference our HTTP sitemaps in robots.txt. Should we update the robots.txt to include our new HTTPS sitemaps?
We recommend separate robots.txt files for HTTP and HTTPS, pointing to separate sitemap files for HTTP and HTTPS. We also recommend listing a specific URL in only one sitemap file.
What URLs should our sitemaps list if we have redirects (from HTTP to HTTPS or the reverse)?
List all HTTP URLs in your HTTP sitemap, and all HTTPS URLs in your HTTPS sitemap, regardless of redirects when the user visits the page. Having pages listed in your sitemap regardless of redirects will help search engines discover the new URLs faster.
From this I assume the following should be correct:
http://example.com/robots.txt should exist and have a Sitemap directive pointing to the old sitemap.xml with http urls.
https://example.com/robots.txt should exist and have a Sitemap directive pointing to the new sitemap.xml (maybe called something like sitemap_https.xml) with https urls that are same as the old ones but have https instead of http.
But further reading of google guidelines shows another approach that contradicts this one (or maybe I just misunderstood something?)
From answer 6033080:
Update your robots.txt files:
On the source site, remove all robots.txt directives. This allows Googlebot to discover all redirects to the new site and update our index.
On the destination site, submit the two sitemaps you prepared previously containing the old and new URLs. This helps our crawlers discover the redirects from the old URLs to the new URLs, and facilitates the site move.
This is how I understand this approach:
http robots.txt should exist and have no directives in it (be empty).
https robots.txt should exist and have two Sitemap directives, one to old sitemap.xml and another to new sitemap_https.xml
Maybe "submit the two sitemaps" means something different from listing them in robots.txt? Like using the Search Console or something? It doesn't clarify, just "submit"...
Besides, point 1 of this approach contradicts point 1 of the first approach.
If you are keeping both HTTP and HTTPS and are not planning to redirect everything to HTTPS, then maybe Google's advice makes sense. But besides that it seems like strange advice to me.
Presumably you want to move everything to HTTPS eventually, so you should use HTTPS URLs wherever possible. Your robots.txt file would show your HTTPS sitemap link on both
https://example.com/robots.txt. And similarly for the sitemap, it would show HTTPS URLs on both versions.
This is much easier from a technical perspective, and will prioritise HTTPS URLs in Google.
The first approach is the correct one. We successfully migrated a high traffic and rankings website from HTTP to HTTPS completely. The approach based on Google guidelines was:
All HTTP URLs do a 301 permanent redirect to HTTPS.
http://www.example.com/robots.txt would redirect to the HTTPS version
The new sitemap shall have all HTTPS links.
Here is a good post about this from Google :