disallow crawling of parts of our website by robots
January 25, 2022 at 7:31 pm #5589FalconmvParticipant
We want to disallow crawling of parts of our website by robots.To achieve that, we use this entry in the robots.txt,use our agent asterisks, disallow slash folder-x. Can this entry cause any kind of issues with advertising systems like AdSense, for example, disabling to display ads on slash folder-x? Should we specify a user agent, which we want to disallow for crawling or rather specify an add system user agent, which we are supposed to allow?”February 25, 2022 at 7:42 pm #5591WebmasterFocusKeymaster
So I guess first off, I don’t know specifically how the AdSense side handles this. So I can’t confirm or deny how the AdSense crawler would look at that. I believe we have this documented in our Help Center though. As far as I know, the AdSense crawler doesn’t look at the generic restricts, but rather, is just looking for its unique ones. In general though, how it works is that user agents try to follow the most specific directives that you have there. So if you have one block for user agent asterisk and one block for user agent, I don’t know, the AdSense bot. Then the AdSense bot, when it looks at your robots text file, would only take into account the most specific section. So it would only take into account that user agent section that you have specifically defined for it and all other bots would differ to the most generic one, because that’s the only one that’s specific enough for them. And this is something that you can use for the different kinds of Google bots, for web search if you want . If you have a section for Google News, and you have a section for normal web, then that’s something you can control that as well. And the same thing also happens on a directive level in that if you have a specific URL, then we will look at your robots.txt file and find the most specific directive that applies to that URL. So if you have, for example, disallow folder-x, and you have allow folder-x slash subfolder 2, and a URL within subfolder 2 comes up, then the most specific one would be that second directive. Then we would follow that one. This can be a bit tricky, I guess, in the beginning. What I would recommend doing there is using the robots text testing tool in Search Console, which does this more or less for you and tells you, yes, this is OK. This can be crawled, or no, this would be blocked by crawling. Similarly, you can just edit the robots.txt file on top and see how things change. We have a very comprehensive documentation for robots.txt. So if you want to do something more specific, kind of special with your robots.txt file, then I would take a look at that and see how you get along there.
- You must be logged in to reply to this topic.