Ampersand (&) in actual URL and sitemap

by user109308   Last Updated September 19, 2018 07:04 AM

I have a sitemap (which is submitted to google for indexing) in which I am adding URLs which have ampersand(&) in it. Since in sitemap '&' has to be escaped I replaced '&' with & in the URLs in sitemap. My actual page URLs contain just '&'. As I am new to google webmasters and SEO, I wished to understand if this difference of ampersand in URL and sitemap will cause any issue. Will my pages get indexed? This may seem noob level question because I am able to access the site after replacing '&' with '&' in the URL. But still any help on this front will be highly appreciated.



Answers 2


The URLs you include in the sitemap must follow the RFC-3986 standard. As you can read in the Google official guide, & must be escaped with & so you are good to go.

Nevertheless, once you submit your sitemap through Google Searh Console you will be able to know if there is any problem with the URLs submitted.

Just in case, you should only include final URLs, that means that maybe there are some parameters you could avoid. For example:

 - https://example.com/shoes (good one)
 - https://example.com/shoes?order=1&color=blue (remove this one)
Emirodgar
Emirodgar
September 19, 2018 06:40 AM

if this difference of ampersand in URL and sitemap will cause any issue.

tl;dr No issue, because the URLs are the same.

Since in sitemap & has to be escaped I replaced & with & ...

Your sitemap is an XML document. As with any XML document, the data values must be stored XML-entity encoded. The & character is a special character (it itself denotes the start of an XML-entity) and therefore must be encoded to negate its special meaning. This is just the way data is stored inside an XML document.

When the XML document is read by an XML parser the data values are XML-entity decoded, back to the actual value. So, & becomes & when the XML document is read.

So, a URL of the form /page?foo=1&bar=2 stored inside an XML document is identical to the URL /page?foo=1&bar=2 in your HTML5 document.

My actual page URLs contain just &

In HTML5 that is perfectly OK, providing there is no ambiguity. However, in HTML4.1 (and earlier) you would have needed to correctly HTML-entity encode the & as & in your HTML source code for valid HTML. However, browsers are very tolerant and your HTML document would most probably have still "worked".

In HTML5 you only strictly need to HTML-entity encode the & if there is an ambiguity. Take the following contrived example. We want to pass the literal string "$" in the foo URL parameter.

<!-- In an HTML document (WRONG) -->
<a href="/page?foo=&dollar;">link</a>

The desired URL is http://example.com/page?foo=&dollar;, however, the above HTML anchor results in sending the user to http://example.com/page?foo=$ - which is not the intention. To create the desired result, the & must be HTML-entity encoded to negate its special meaning, resulting in the following (correct) HTML:

<!-- In an HTML document (CORRECT) -->
<a href="/page?foo=&amp;dollar;">link</a>

It is always safer to consistently HTML-entity encode the & in your HTML-document. If you are generating your content through a CMS, then this should be automatic.

I am able to access the site after replacing & with &amp; in the URL.

Presumably you mean "in the URL, in your HTML"? Because if you were to HTML-entity encode the & with &amp; in the browsers address bar (for instance), ie. outside of an HTML context, then you will not get the expected results. For example, if you typed the following directly into the browser's address bar:

/page?foo=1&amp;bar=2

Then you would get the two URL parameters [foo] => 1 and [amp;bar] => 2, which is clearly not the intention.

MrWhite
MrWhite
September 19, 2018 12:40 PM

Related Questions


Updated May 06, 2019 11:04 AM

Updated August 28, 2019 17:04 PM

Updated September 15, 2017 09:04 AM

Updated January 28, 2017 14:01 PM