WebmasterFocus
Forum Replies Created
-
AuthorPosts
-
WebmasterFocus
KeymasterSure. I mean, this is something that it’s not just like the sites where it’s coming from that’s a problem, you could have a paid link on a really popular and important website and it’s still a page link.
Yeah, but I mean, it’s not only that, it’s like if you have a really high quality site, but the link is actually buried in an archive that nobody ever visits or nobody ever goes to, then it’s really not a very high quality link at all because it’s just sitting in a dusty old basement hidden away. To me, yeah, you can absolutely have a high quality site with a low quality link. I’m not so sure about the low quality site and the high quality link– that’s kind of an oxymoron.
WebmasterFocus
KeymasterYes, definitely make sure that you’re using redirects from the old version to the new version. We have a relatively new part in our help center about moving sites, about changing hosting, for example. And I’d definitely go through that so that you understand what we recommend and see how far you can actually go there. If you’re just moving the site from one server to another and you’re keeping the same URLs, then of course we won’t see any 404s because the URL will just be active on the other server. The only thing you need to watch out for is that there is this kind of overlapping time between the time when we recognize that this URL is hosted somewhere else and the time that we still have your old IP address essentially cached. And during that time, it probably makes sense to have the content available on both of these servers so that users and our crawlers don’t get stuck there. But if you’re moving from one domain to another, definitely make sure you set up the redirects so that we can kind of pick up on that connection and that we don’t have to guess at what’s actually happening here.
WebmasterFocus
KeymasterSo I guess first off, I don’t know specifically how the AdSense side handles this. So I can’t confirm or deny how the AdSense crawler would look at that. I believe we have this documented in our Help Center though. As far as I know, the AdSense crawler doesn’t look at the generic restricts, but rather, is just looking for its unique ones. In general though, how it works is that user agents try to follow the most specific directives that you have there. So if you have one block for user agent asterisk and one block for user agent, I don’t know, the AdSense bot. Then the AdSense bot, when it looks at your robots text file, would only take into account the most specific section. So it would only take into account that user agent section that you have specifically defined for it and all other bots would differ to the most generic one, because that’s the only one that’s specific enough for them. And this is something that you can use for the different kinds of Google bots, for web search if you want . If you have a section for Google News, and you have a section for normal web, then that’s something you can control that as well. And the same thing also happens on a directive level in that if you have a specific URL, then we will look at your robots.txt file and find the most specific directive that applies to that URL. So if you have, for example, disallow folder-x, and you have allow folder-x slash subfolder 2, and a URL within subfolder 2 comes up, then the most specific one would be that second directive. Then we would follow that one. This can be a bit tricky, I guess, in the beginning. What I would recommend doing there is using the robots text testing tool in Search Console, which does this more or less for you and tells you, yes, this is OK. This can be crawled, or no, this would be blocked by crawling. Similarly, you can just edit the robots.txt file on top and see how things change. We have a very comprehensive documentation for robots.txt. So if you want to do something more specific, kind of special with your robots.txt file, then I would take a look at that and see how you get along there.
February 25, 2022 at 7:28 pm in reply to: Google crawled our site a month ago and found 18,000 404 pages. But we got rid of those pages a few years ago. Is something wrong? #5588WebmasterFocus
KeymasterYeah. So I think maybe a month ago or something like that, we had something where Googlebot went off and crawled a bunch of really old pages that we should know don’t exist. And that caused a lot of 404 errors. But these 404 errors don’t cause any problems on your side, in the sense that from a Search ranking point of view, we don’t care about this. If it’s a 404, that’s a perfect result code for us. We can ignore that page in the future a little bit and not crawl it as frequently. So that’s not something you’d need to take action on and fix those 404 errors if you know that these are really pages that should essentially have been gone.
February 25, 2022 at 7:24 pm in reply to: A blog can have a lot of duplicate content on a blog. I use the case no index on everything except the main page and the articles itself? Or is there some way better? #5585WebmasterFocus
KeymasterNo. I think that’s perfectly fine. One thing I try to make sure is that the individual article pages themselves do have something unique by themselves as well. So you can control this a little bit if you’re using extract on the category on the listing pages and the full text on the kind of the article itself. Or if you have a lot of comments, then that of course is also a differentiation where we would see, well, this has the article. And the other kind of the blog post itself has the article and all of these comments that have additional value there.
February 25, 2022 at 3:34 pm in reply to: share our original news content with another publisher #5581WebmasterFocus
KeymasterNo access to their head to canonical this.Obviously if you’re sharing content then that’s duplicate content.It’s like the same content on both of these websites.And what can happen is that we rank the other website first for some of that content.So that’s something that’s always a possibility.If you don’t want that to happen at all, the best way to do that is not to share your content.So that’s something where I think, and as a website,as a business, you kind of have to look at those options.On the one hand sharing your content,maybe you’re reaching a bigger audience.On the other hand, ranking for your own content,making sure that your website is shown first for that content.And that’s something that you kind of have to figure out on your own, and think about where does it make sense to get a broader audience to this type of content.And where does it make sense to make sure that our name is the primary one associated with this content.
February 25, 2022 at 3:30 pm in reply to: disallow links from websites with wild card duplications #5578WebmasterFocus
KeymasterSo I’m not 100% sure what you mean with wildcard domains. I have sometimes seen sites that are more kind of like directory sites, where the sub-domain is kind of a category, and there are millions of subdomains there that kind of result in a really complicated structure to setup. And if you feel that links from a site like that are causing you problems, maybe you’ve bought links there in the past and you want them removed, then you can use a domain listing in the disavowed file, and just say everything from this domain. And that will automatically include all of the subdomains that are there as well. On the other hand, if you just want to disavow certain subdomains, then you can use a domain entry and say domain, colon, and the specific subdomain. And then we’ll take that into account like that.
WebmasterFocus
KeymasterYou can do this it’s not something that I’d say is clearly positive or negative sometimes people search for pdfs and they want to be able to find them it’s not the case that you get a duplicate content penalty for pdfs and it’s probably rare that you have the same content ranking as HTML and PDF in the same search results page so probably you could just link to those pdfs normally and just let them get indexed if there’s content on there that people are using as a PDF then maybe they want to find out in search as well so I don’t have any explicit answer there that’s something where I probably do some AP testing with your users directly using link titles on desktop that the mouse sees that when you hover over the text but on smartphone obviously you can’t hover over a text what will happen with the mobile mobile index will probably keep treating them in the same way when it comes to the mobile first index I don’t know if that will change over time but at least initially will index them in the same way.
WebmasterFocus
KeymasterSo you have the site with all of this duplicate content across the subdomains, and other sites are linking to that duplicate content. And the question is if those external sites would be penalized for linking to something that has a lot of duplicate content. And that’s usually not the case. So if we see external sites, or any site linking to another piece of content and we recognize that the other piece of content isn’t that great, then that’s not something where we would penalize that site. It’s not that what the manual webspam team would come along and say, well, this site is linking to something spammy, therefore it must be spammy too. that other site is, because you can’t really invest the time needed to do like a full web spam analysis of some external site that you happen to be linking to. So that shouldn’t be the case that any site is penalized for linking to something that’s kind of spammy or duplicate content-ish.. until we’ve been able to re-crawl them,reprocess them, and recognize that you’ve kind of improved your mark up there.
-
AuthorPosts