I made a simple new website which went live about a month and a half ago (replacing the last one). The site contains new pages, like for example the navigation markup:
<div class="header"> <div class="navigation menu"> <ul> <li><a href="index.php">Home page</a></li> <li><a href="products.php">Products</a></li> <li><a href="reference.php">Reference</a></li> <li><a href="about_us.php">About us</a></li> <li><a href="contact.php">Contact</a></li> </ul> </div> </div>
The problem is, that google crawlers keep looking for the old pages, which I removed and aren't on the server.
I keep removing the crawl errors (marked as fixed) in the webmaster tools, but the crawlers keep attempting to crawl those pages.
I believe that the last website version is cached somewhere, so is there a way to remove it? (I don't have a lot of experience with google webmaster tools.)
Maybe not the best or correct way to do this but I had the same issue. What I finally did was redirect to the homepage or a 404. I'd have to check to see if I still have the redirect but I no longer get the crawl error.
This is going to happen for a looong time. Other sites might be linking to the old URLs, which will prompt Google to crawl them and/or your site might just be having problems (in Googles eyes) and is giving you the benefit of the doubt that the pages might return. Either way, Google continues to crawl old pages for a long time. It would be far worse for Google to suddenly stop crawling your pages after getting a bunch of 404's.
If the pages genuinely do not exist and there is no alternative then it is correct to return a 404 (Not Found). Or you can return a 410 (Gone) instead for these pages that are never going to return - that is a far stronger/definite indication to Google that the pages aren't coming back.
Note that serving a 404 (or 410) for these pages, and having them reported as such in GWT is not necessarily a bad thing, and will not detrimentally affect your site in itself. This is a private report for your benefit.
However, what can be bad for SEO is if there are other sites linking to your old page that would have otherwise passed PageRank. By returning a 404, these sites will no longer be able to pass that PageRank. If you have alternative pages for those removed, then 301 redirect to the new location in order to preserve your ranking, help search engines re-index your content and ... to please your users. But if you have simply removed the old content and not replaced it then you need to be prepared to take the potential SEO hit.