So recently, our website has been hacked and we're trying to clean everything right now. But, when doing the "site:" search it still shows the cached japanese websites.
So we tried playing with robots.txt i.e.:
But when I enter the bad URL in robots.txt tester, it still allow the URL that we don't want.
Is there any way that google only crawls the sitemap on robots.txt without manually entering all the bad links on the "Disallow"?
Google has never limited itself to crawling and indexing just URLs that are in the sitemap. Such functionality does not exist, and I doubt that it ever will.